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Abstract 

This report examines how to estimate the parameters of a chaotic system given noisy observations of 
the state behavior of the system. Investigating parameter estimation for chaotic systems is interesting 
because of possible applications for high-precision measurement and for use in other signal processing, 
communication, and control applications involving chaotic systems. 

In this report, we examine theoretical issues regarding parameter estimation in chaotic systems and develop 
an efficient algorithm to perform parameter estimation. We discover two properties that are helpful for 
performing parameter estimation on non-structurally stable systems. First, it turns out that most data in 
a time series of state observations contribute very little information about the underlying parameters of 
a system, while a few sections of data may be extraordinarily sensitive to parameter changes. Second, for 
one-parameter families of systems, we demonstrate that there is often a preferred direction in parameter 
space governing how easily trajectories of one system can "shadow" trajectories of nearby systems. This 
asymmetry of shadowing behavior in parameter space is proved for certain families of maps of the interval. 
Numerical evidence indicates that similar results may be true for a wide variety of other systems. 
Using the two properties cited above, we devise an algorithm for performing parameter estimation. Stan- 
dard parameter estimation techniques such as the extended Kalman filter perform poorly on chaotic 
systems because of divergence problems. The proposed algorithm achieves accuracies several orders of 
magnitude better than the Kalman filter and has good convergence properties for large data sets. In some 
systems the algorithm converges at a rate proportional to — where n is the number of state samples pro- 
cessed. This is significantly better than the — != convergence one would expect from nonchaotic oscillators 
based on purely stochastic considerations. 
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Chapter 1 



Introduction 



In this report we investigate theoretical limitations and develop computational methods 
for estimating the parameters of a chaotic system given a noisy time series of state data 
about the system. There are two primary reasons why we are interested in parameter 
estimation of chaotic time series. First, there has been considerable interest in recent 
years regarding signal processing and control applications involving chaotic systems (see 
e.g., [11], [49], [9]). Parameter estimation has traditionally been an important problem in 
signal processing and control theory, so in light of recent applications involving chaotic 
systems, it is important to investigate what happens when the signals and systems 
involved are chaotic. 

Second, it has been suggested that parameter estimation in chaotic systems may have 
applications for high-precision measurement. In particular the idea is that if a system 
is chaotic and displays a sensitive dependence on initial conditions, then it can also be 
sensitive to small changes in parameter values. Consequently, development of successful 
parameter estimation techniques could make it possible to measure the parameters of a 
system extremely accurately given a time series of data about the state of the system. 

Our goal in this report is to systematically explore the feasibility of parameter es- 
timation in chaotic systems including a theoretical analysis of what accuracies we can 
reasonably expect to obtain and what factors limit this accuracy. We also present new 
numerical algorithms for estimating the parameters of chaotic systems and discuss sim- 
ulations demonstrating the performance of the algorithms. 

It turns out that the parameter estimation problem is especially interesting because 
it is simple enough that one can look carefully at the underlying dynamical mechanisms 
that affect the feasibility and efficiency of various numerical approaches. This is in 
contrast with a number of typical research problems involving chaotic time series which 
are broad enough that heuristics must generally be relied upon to attack the problem 
numerically. On the other hand, the parameter estimation problem is also complex 



enough that the results are interesting, and in some cases, quite unexpected. As we shall 
see, a close examination of the relationship between system dynamics and parameter 
estimation reveals interesting observations that greatly aid in the development of an 
efficient numerical approach. 



1.1 The problem 

Before proceeding further, we should be more explicit about what is meant by "pa- 
rameter estimation." Basically, the idea is the following: Suppose that we are given a 
parameterized family of mappings f p (x), where x is the state vector of the system and 
p are some invariant parameters of the system. We will assume that f p (x), varies con- 
tinuously with x and p. Further, suppose that we are given a sequence of observations, 1 
{?/„}, of a certain state orbit, 2 {x n }, where: 

and y n = x n + v n 

for all integer n where v n represents measurement errors in the data stream, {y n }- We 
are interested in how to estimate the value of p given a stream of data, {y n }- Note that 
we will concentrate the discrete-time formulation, but the results apply analogously to 
continuous time systems. For example, one might imagine that time is one of the state 
variables of the system, and that the y' n s represent samples of a continuous-time system. 

For analytic purposes it is helpful to assume that, to hrst approximation, the mag- 
nitude of the measurement errors are bounded so that: 

Kl < e 

for some e > 0. For purposes of analyzing and evaluating algorithms, it will also be 
useful later to think of v n as a random variable with various probability densities. 



1.2 Preview of important issues 

Parameter estimation and shadowing 

Let us now try to get a flavor for some of the important issues that govern the 
performance of parameter estimation techniques. First of all, given a family of mappings 



instead of writing {*n}^° = o; we w iU sometimes write {x n } to denote an infinite sequence of states. 

2 We will refer to a sequence of states, x n ^ =Q) as an orbit of the map / if x n +\ = f(x n ) for all integer 
n. Finite sections of infinite orbits, for example x n ^ =0 , for some N > may also be referred to as 
orbits. 



of the form, f p , and a noisy stream of state data, {?/„}, we would like to know which / p 's 
have orbits that closely shadow or follow {y n }- We know that {y n } represents an actual 
orbit of f p for some value of p } with e magnitude measurement errors added in. Thus, 
if no orbit of f p shadows {y n } within e error for a particular parameter value, p 0} then 
p cannot be the actual parameter value of the system that is being observed. On the 
other hand, if many systems of the form, f p} have orbits that closely shadow {?/„}, then 
it would be difficult to tell from the state data which of these systems is actually being 
observed. 

It turns out that a significant body of work is available to answer questions like, 
"what types of systems are insensitive to small perturbations so that orbits of perturbed 
systems shadow orbits of the original system and vice versa?" However, many of the 
results in this direction are topological in nature; that is, they answer questions like 
whether such shadowing orbits must exist or not for certain classes of systems. On the 
other hand, in order to evaluate the possibilities for parameter estimation, it is also 
important to know more geometrically-oriented results like, "how closely do shadowing 
orbits follow each other for nearby systems in parameter space" and "how long do orbits 
of nearby systems follow each other if the orbits do not shadow each other forever." 
Such results tend to be more difficult to establish and also depend more specifically on 
the systems involved. 

An example in one dimension 

Investigating the geometry of shadowing orbits can yield some interesting results. 
For example, consider the family of maps: 

fp(x) = px{\ - x) (1.1) 

for x G [0, 1] and p G [0, 4]. Henceforth we will refer to the family of maps (1.1) as simply 
the family of quadratic maps. 

It is known ([5]) that for a non-negligible set of parameter values, the quadratic 
maps in (1.1) produce chaotic behavior for almost all initial conditions, meaning that 
orbits tend to explore intervals in state space, and nearby orbits experience exponential 
local divergence (i.e., positive Lyapunov exponents). Suppose that we pick p = 3.9 and 
iterate an orbit, {x n }, of f Po starting with the initial condition x = 0.3. Numerically, 
the resulting orbit appears to be chaotic and exhibits the properties cited above, at least 
for large numbers of iterates. Now consider the question: "What parameter values, p } 
produce orbits that shadow {x n } for many iterations of f v V We can get some idea of the 
answer to this question by simply picking various values for p near 3.9 and attempting 
to numerically find orbits that shadow {x n }. There are a number of issues (see Chapter 
5) about how to do this. 3 However, let us for the moment simply assume that the results 
we present are at least qualitatively correct. 



3 For example, note that because we cannot numerically iterate the orbit {x n } accurately for many 
iterations, one could argue that the experiment is dominated by roundoff errors. However, while our 

10 



Figures 1.1 and 1.2 show the result of carrying out the described numerical experi- 
ment with p = 3.9 and x = 0.3. For values of p close to p 0} we attempt to find finite 
orbits of f p that closely follow the f Po orbit, {x n = f n (xo)}n =0 , f° r integers N > 0. 4 In 
order to measure how closely maps with different parameters can shadow {x n }^ =0 , it is 
helpful to define e(p,N,Xo,po) to be the maximal distance between the orbit, {x n }^ =0 , 
and the closest shadowing orbit, {fp(zo)}n = o, of f p . In other words, let: 

e(N,p,p ,x ) = jnf ^niax^ \f p (z ) - f Po (x )\. (1.2) 

So, for eachp and integer N > 0, e(N } p } p 0} x ) measures how closely the best possible 
shadowing orbit of f p follows the orbit, {x n }^ =0 , of f Po . For the purposes of this particular 
experiment let p = 3.9 and x = 0.3 be constant and set e(N } p) = e(N } p } p = 3.9, x = 
0.3). There is nothing particularly special about our choice of p = 3.9 or x = 0.3. As 
we shall see later, many other paramter values and initial conditions yeild similar results. 

Figure 1.1 shows the result of numerically computing e(N } p) with respect to p for 
three values of N. The three v-shaped traces in the figure represents a plot of e(N } p) 
for N = 61, N = 250, and N = 1000. e(N } p) is plotted on the y— axis, while p — p 0} 
the difference in parameter value, p } from the original parameter value, p 0} is labeled on 
the x— axis. Note the distinct asymmetry of the graph between values of p greater than 
and less than p = 3.9. In fact, for N = 250 and N = 1000 the graph is so steep for 
p < Po that it looks coincident with the vertical line demarking p — po = 0. It seems that 
at least in this case, systems with parameter values, p } less than p do not shadow the 
orbit, {x n }, nearly as easily as those systems with parameter values greater than p . In 
some sense, it seems that orbits for higher parameter values are more flexible, or have a 
greater degree of freedom than do orbits for slightly lower parameter values. 

This phenonmenon of asymmetrical shadowing may seem counterintuitive. If an 
orbit, O(po) } of paramteer value p is shadowed by an orbit, O(po + <5), of a slightly 
parameter value, p + <5, then given the orbit, O(po + <5), of parameter p + <5, isn't 
O(po + <5), shadowed by the orbit, O(po), of a lower parameter value, p ? Yes, but as we 
shall see, it may be that the set of orbits of f Po +s that are shadowed by an orbit of f Po is 
actually vanishingly small. That is, if an orbit of f Po +s is generated by choosing an inital 
condition at random, we would find that the probability that that orbit is shadowed by 
an orbit of p is zero. 

Returning to the example at hand, we find that the asymmetry in parameter space 
is even more apparent if we consider how e(N } p) varies with N. Basically we want to 



particular numerically-generated starting orbit may not look like the actual orbit, {iE n }, with initial 
condition xo = 0.3 for large values n, we will later see that qualitatively the pictures we present are 
similar. 

4 Here we let f n+1 = /(/"), so that the function, f" , refers to the composition of / with itself n 
times (define f° to be simply the identity function). Note that if x n+ i = f(x n ) for all integer n, then 
x n = f n (xo) for all n. 

11 



keep track of how the curves in figure 1.1 move inward toward the vertical line, p = p 0} 
as N increases. We can do this by fixing a constant, e , and keeping track of which 
parameter values, p } satisfy e(N } p) < e for varying values of N. For example, for a 
particular value of e , suppose that Jjy is the maximal interval in parameter space such 
that p G In an d e(N } p) < e for all p G In- We are interested in what fraction of the 
interval, 7jv, is greater than or less than the original parameter value, p . To keep track 
of this let In = In U In so that ^n = [Pn>Po] an d -^iv = [Po 5 Piv] where p~^ < p and 
Pn — Po- Let a(N) be the length of 7^ and let b(N) be the length of I N . Figure 1.2 shows 
graphs of a(N) and &(A) with respect to N as computed numerically for e = 0.01. Note 
that the scale for a(N) and b(N) on the j/-axis is logarithmic so that a(N) is several 
orders of magnitude smaller than b(N) for larger values of A, reflecting the asymmetry in 
parameter space. Also, we see that a(A) and b(N) both appear approximately constant 
for large stretches of A except where a(A) decreases in large increments over a small 
number of iterates. We will later see that these decreases in a(A) occur along short 
stretches of the orbit, {x n }, where small differences in the parameter value of the system 
can easily be distinguished by even noisy state data. 

Applying theory to develop estimation algorithms 

Figures 1.1 and 1.2 illustrate two interesting properties for the quadratic map exam- 
ple: (1) there is an asymmetry in the shadowing behavior of maps in parameter space, 
and (2) most iterates of a specific orbit are apparently not very sensitive to small changes 
in parameter values, while a few special iterates may be especially sensitive to parameter 
changes. It turns out that these two properties can be extremely helpful in developing 
an algorithm to do parameter estimation. 

First of all, the asymmetry illustrated in figure 1.1 can be quite helpful. For instance, 
in the example we just considered, few maps, f p} with parameter values lower than p 
have orbits that can shadow the given orbit of f Po . Suppose that we are given noisy 
measurements of a state orbit, {x n }. If we find that only maps from a certain interval 
in parameter space can shadow the observed data, then the real parameter value should 
be close to the lower endpoint of this parameter range. Thus, to hrst order, if e is the 
magnitude of measurement error, the error in the parameter estimate is approximately 
governed by either a(A) or b(N), whichever one happens to be smaller. 

In addition, we will see later that figure 1.2 reflects the fact that a few sections of 
the observed state data stream contribute greatly to our knowledge of the parameters of 
the system, while much of the rest of the data contributes almost no new information. 
Thus, if we can quickly sift through all the useless data and examine the critical data 
very carefully, we should be able to vastly improve a parameter estimation technique. 

The key to this is whether or not physically interesting systems have the properties 
described above. A major objective of this report will be to investigate the relevant 
mechanisms behind the two properties and explore what types of systems might exhibit 
these properties. We will then investigate how to take advantage of these two properties 
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Figure 1.1: Graph of best shadowing distance, e(N,p), with respect to p, for f p = px(l — x), 
p = 3.9 and x = 0.3. e(N,p) measures how closely an orbit of f p can shadow a fixed orbit, 
{ x n}n=oi °f fpo ■ O n the a;— axis of the graph, p is labeled as p — p , the difference in parameter 
value from the parameter used to generate {x n }^ =0 . e(N,p) is plotted on the y— axis with N 
held constant for three different values of N. The three v-shaped curves represent e(N,p) for 
N = 61, N = 250, and N = 1000. Note the distinct asymmetry in how well orbits of f p track 
i x n}n=o for P> Po and p < p . 
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Figure 1.2: Graph of a(N) and b(N) with respect to N for e = 0.01. a(N) is a measure of 
the number of parameter values, p < p , such that there exists an orbit of f p that can shadow 
the orbit, {x n }^ =0 , of f Po with less than e error. Similarly b(N) measures the number of 
parameter values, p > p , such that f p that can shadow the orbit, {x n }^ =0 , with less than e 
error. 
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to produce superior parameter estimation algorithms. 



1.3 New results and previous work 
1.3.1 Dynamical theory and shadowing orbits 

There has not been much work directly attacking the parameter estimation problem for 
chaotic systems. However, as we saw in the previous section, the feasibility of parameter 
estimation is closely related to the concept of shadowing orbits. 

For uniformly hyperbolic systems, it is well know that orbits of perturbed systems 
shadow orbits of the original system forever ([7], [4]). Applying this result to parameter 
estimation, we find that one cannot expect to get accurate information about the pa- 
rameters of a hyperbolic system based on state data, since it is difficult to distinguish 
orbits from systems with two nearby parameter values. 

However, most physically interesting chaotic systems are not in fact hyperbolic. In 
general 5 , one can only expect so-called subexponentially hyperbolic behavior (see eg., 
[52]), so that hyperbolicity on a state orbit is available on a local scale, but is not uniform 
over an infinite orbit. The result is that most finite pseudo-orbits 6 of a system can be 
shadowed closely by a real orbit of that system. This observation was made in [24], 
where attempts were also made to establish bounds on the shadowing behavior of finite 
orbits in nonhyperbolic systems by using linearization to exploit the locally hyperbolic 
behavior along a typical state orbit. Such work received interest because shadowing was 
thought of as a helpful property that lends credibility to computer-generated orbits with 
roundoff error. 

In the case of parameter estimation, the hyperbolic degeneracies that prevent shad- 
owing behavior are in fact the focus of most of the interest. This is unlike past work 
involving shadowing orbits, because in order to investigate the feasibility of parameter 
estimation, it is important to specifically examine the mechanism behind the lack of 
shadowing behavior in nonhyperbolic systems. In addition, it is also necessary to exam- 
ine carefully how orbits for one parameter value can shadow orbits for systems with a 
continuum of different parameter values. 

The result is that we find that most measurements of the state of a system contain 
comparatively little information about the parameters of the system except for those 
iterates where the hyperbolic behavior of a system becomes degenerate. This is the 
phenomenon we observed with the quadratic map. 



5 for example, for almost all C 2 diffeomorphisms 

6 A pseudo-orbit of a map, g, is a sequence of states {z n } such that z n+ i = f(z n ) + v n for all n, 
where the magnitude of the noise, \v n \, is assumed to be small. 



15 



In this report, we discuss how the lack of shadowing behavior seems to be the result 
of a mechanism which shall be referred to as folding in state space. It also seems that this 
folding behavior tends naturally to result in one-sided shadowing behavior in parameter 
space, making it possible to effectively distinguish parameter values near areas where 
folding occurs. 

For one dimensional maps like the quadratic map, we have been able to characterize 
the results quantitatively. For example, for the quadratic map, f p (x) = pxil — x), we 
show that the following is true: 

Proposition: Let 

e(p,po,x ) = /irajv-Kx>e(./V,p,po,Zo) 

where e(p } p 0} x ) is as given in (1.2). There exist constants 8 > 0, C > 0, and K > 
such that the following is true: For any 7 £ (0, 1), there is a set, £"(7) C [0, 4], of positive 
Lebesgue measure such that if p £ E(~/) } then : 

(1) Forx £ [0,4], 

e(p,p ,x ) < C\p- p \~ 
for all p £ (po.po + 8). 

(2) For almost all x £ [0,4], 

e(p,p ,x ) > K(p-po) 1 
for all p £ (p - 8 } p ). 

This follows from Theorem 3.4.2. 

From the proposition we see that there can in fact be a pronounced asymmetry in 
the shadowing behavior of orbits in parameter space and that this phenomenon is quite 
prevalent. For the quadratic maps (1.1) with positive Lyapunov exponents, it can also 
be shown that the asymmetry always favors one particular direction in parameter space 
for maps. That is, it is always easier for orbits of maps with slightly higher parameters 
to shadow orbits of maps with slightly lower parameters. 

For more complicated systems, like systems in higher dimensions, it is more diffi- 
cult to establish definite analytical results. However we present numerical results that 
demonstrate that surprisingly many systems have the properties discussed, namely that 
(1) a small fraction of the data contains most of the information about the parameters 
of the system, and (2) there is an asymmetry in the behavior of shadowing orbits in 
parameter space. 
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1.3.2 Parameter estimation techniques 

Traditionally, parameter estimation is carried out numerically using algorithms like the 
extended Kalman filter. However, we will demonstrate that algorithms like the extended 
Kalman filter that linearize state and parameter space around a certain trajectory ac- 
tually perform worse than one might expect simply from linearization errors. This is 
basically because most of the information about the parameters are contained in a small 
number of data points, the very data points where nonlinear folding behavior is most 
important. Techniques like the extended Kalman filter have a difficult time modeling the 
folding behavior of state space with these data points, along with the local exponential 
expansion and contraction properties of state space in chaotic systems. The result is 
that these algorithms typically diverge. In other words, the algorithm's estimate of the 
error in its parameter estimate quickly becomes much less than the actual error, so that 
the algorithm ends up converging to the wrong parameter value. 

In this report, we describe a new algorithm for performing parameter estimation on 
chaotic systems and show numerical results demonstrating the effectiveness of the new 
algorithm and comparing the algorithm with traditional techniques. The new algorithm 
attempts to sift through most of the data quickly, concentrating on the measurements 
that are most sensitive to parameter values. The algorithm then uses a technique, based 
on a Monte Carlo method, to pick out a parameter estimate by taking advantage of the 
fact that shadowing behavior tends to be asymmetrical in parameter space. 



1.4 Overview 

This report is divided into two major parts. The first part, which includes chapters 
2-4, discusses theoretical results concerning parameter estimation in chaotic systems. In 
particular, we are interested in questions like: (I) What possible constraints are there 
to the accuracy of parameter estimates, and what kind of accuracy can one expect given 
large amounts of data? (2) How is the accuracy of a parameter estimate likely to depend 
on the magnitude of the measurement error and the number of state measurements 
available? (3) What types of systems exhibit the most sensitivity to small parameter 
changes, and what types of systems are likely to produce the most (and least) accurate 
parameter estimates? Basically we want to understand exactly how much information 
state samples actually contain about the parameters of various types of systems. 

In order to answer these questions, we first examine how parameter estimation relates 
to well-known concepts like shadowing, hyperbolicity, and structural stability. Chapter 
2 discusses how the established theory concerning these concepts relates to the problem 
of parameter estimation. We also examine what types of systems are guaranteed to have 
topologically stable sorts of behavior and how this constrains our ability to do parameter 
estimation. 
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In chapter 3, we examine one- dimensional maps. Because of the relative simplicity 
of these systems, they are ideal for investigating how the specific geometry of a system 
relates to parameter estimation, especially when one is dealing with systems that are not 
topologically or structurally stable. New quantitative results are obtained concerning 
how orbits for nearby parameter values shadow each other in certain one-dimensional 
families of maps. 

In chapter 4 we examine non-uniformly hyperbolic systems of dimension greater than 
one. In such general settings it is difficult to make quantitative statements concerning 
limits to parameter estimation. However, we extend ideas from the analysis of one- 
dimensional systems to suggest mechanisms that determine the shadowing behavior 
of orbits. These mechanisms result from an examination of the stable and unstable 
manifolds of the systems. Although the conjectures we make are not rigorously proved, 
they are supported by numerical evidence. 

The second major part of the report (comprising chapter 5) describes an effort to 
use the dynamical systems theory to develop a reasonable algorithm to numerically esti- 
mate the parameters of a system given noisy state samples. We discuss why traditional 
methods of parameter estimation have problems, and some ways to hx these problems. 

In chapter 6 we present numerical results demonstrating the effectiveness of the new 
estimation techniques proposed. 

Chapter 7 summarizes the main conclusions of this report, and suggests possible 
future work. 
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Chapter 2 

Parameter estimation, shadowing, 
and structural stability 



In this chapter we review a variety of established mathematical results and apply these 
results to an analysis of parameter estimation. In particular, we examine how topolog- 
ical stability results for certain types of systems constrain the feasibility of parameter 
estimation. 



2.1 Preliminaries and definitions 

In this section, we introduce some of the basic definitions and tools needed to analyze 
problems related to parameter estimation. We begin by restating a mathematical de- 
scription of the problem. We are given the family of discrete mappings, f p : M — >■ M 
where M is a smooth compact manifold and p represents the invariant parameters of 
the system. For the purposes of this report, we will also assume that p is a scalar so 
that f p represents a one-parameter family of maps for p £ I p} where I p C IR is a closed 
interval of the real line. Note that it will often be convenient to write f(x } p) in place of 
fp(x) to denote functional dependence on both x and p. We will assume that this joint 
function of state and parameters, / : M X I p — >■ Af, is continuous over its domain. 

The data we are given consists of a sequence, {?/„}, of noisy observations of the state 
vectors, {x n }, where y n £ Af, x n £ Af, and: 

y n £ B(x n ,e) 

for all n £ Z where e > and B(x n} e) represents an e— neighborhood of x n (ie., y n £ 
B(x n} e) if and only if d(y n} x n ) < e for some distance metric d). In other words, the 
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measured data, y n , consists of the actual state of the system, x n , plus some noise of 
magnitude e or less. 

Note that if we hx p G I p , we can generate an orbit, {x n }, given an initial condition, 
xq. Basically, we would like to know how much information this state orbit contains 
about the parameters of the system. In other words, within possible measurement error, 
can we resolve {x n } from orbits of nearby systems in parameter space? In particular, 
are there parameters near p that have no orbits that closely follow {x n }? If so, then 
we know that such parameters could not possibly produce the state data represented by 
{?/„}, and we can thus eliminate these parameters as possible choices for the parameter 
estimate. Thus, given p G I p and a state orbit, {x n }, of f Po} one important question 
to ask is: For what values of p G I p does there exist an orbit, {z n } } of f p such that 
d(z n , x n ) < e for all n? 

This relates parameter estimation to the concept of shadowing. Below we describe 
some definitions for various types of shadowing that will be useful later on: 

Definitions: Let g : M — > M be continuous. Suppose d(g(z n ) } z n+ i) < 8 for all n. 
Then {z n } is said to be a 6 -pseudo-orbit of g. We say that a sequence of states, {x n }, 
e-shadows another sequence of states, {?/„}, if d(x n} y n ) < e for all n. The map g has 
the pseudo-orbit shadowing property if for any e > 0, there is a 8 > such that every 
(5-pseudo-orbit is e-shadowed by a real orbit of g. The family of maps, {f p \p G I p } } is 
said to have the parameter shadowing property at p G I p if for any e > 0, there exists a 
8 > such that every orbit of f Po is e-shadowed by some orbit of f p for any p G B(p 0} 8). 
Finally, suppose that g G X where X is some metric space. Suppose further that for 
any e > 0, there is a neighborhood of g, U C X } such that if g' G U then any orbit of g 
is e— shadowed by an orbit of g' . Then g is said to have a function shadowing property 
inX 

We can see that the various types of shadowing have natural connections to parameter 
estimation. If two orbits e— shadow each other, then these two orbits will (to hrst order) 
be indistinguishable from each other with measurement noise of magnitude e. If f Po has 
the parameter shadowing property, then all systems near p = p in parameter space have 
orbits that e-shadow orbits of f Po . This implies inherent constraints on the attainable 
accuracy of parameter estimation based on state data, since observable state differences 
for nearby systems in parameter space are lost in the noise caused by measurement 
errors. 

Thus parameter shadowing is really the property we are most interested in because 
of its direct relationship with parameter estimation. The concept of function shadowing 
is simply a generalization of parameter shadowing so that given some function g, we can 
guarantee that any continuous parameterization of systems containing g must have the 
parameter shadowing property at g. This situation implies that the state evolution of 
the system is in some sense stable or insensitive to small perturbations in the system. 
In the literature, the following language is used to describe this sort of "stability:" 
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Definitions: Two continuous maps, / : M — >■ M and g : M — >■ M, are said to be topo- 
logically conjugate if there exists a homeomorphism, h, such that gh = hf. Let Diff r (M) 
be the space of C r diffeomorphisms of M. Then g G Diff r (M) is said to be structurally 
stable if for every neighborhood, U G Diff°(M) } of the identity function, there is a 
neighborhood, V C Diff r (M) } of g such that for each / G V there exists a homeomor- 
phism, h g G U, satisfying / = hj 1 ghf. In addition, if there exists a constant K > and 
neighborhood V C V of g such that sup xeM d(hj(x) } x) < A^ sup^,^ d(f(x) } g(x)) } for 
any / G V, then g is said to be absolutely structurally stable. 

Unfortunately, we have introduced a rather large number of definitions. Some of the 
definitions apply directly to parameter estimation, and others are introduced because 
they are historically important and are necessary in order to apply results found in the 
literature. Before we continue, it is important to state clearly how the various properties 
are related and exactly what they mean for parameter estimation. 



2.2 Shadowing and structural stability 

We now investigate the relationship between various shadowing properties and structural 
stability. The goal here is to relate well-known concepts like pseudo-orbit shadowing and 
structural stability to parameter and function shadowing, so that we can apply results 
from the literature. 

Let us begin with a brief discussion. First of all, given any p G I p , note that if p is 
near p 0} then orbits of f p are pseudo-orbits of f po . The pseudo-orbit shadowing property 
implies that a particular system can shadow all trajectories of nearby systems. That 
is, any orbit of a nearby system can be shadowed by an orbit of the given system. On 
the other hand, function shadowing is somewhat the opposite. A system exhibits the 
function shadowing property if all nearby systems can shadow it. Meanwhile, structural 
stability implies a one-to-one correspondence between orbits of all systems within a given 
neighborhood in function space. Thus, if a system is structurally stable, then all nearby 
systems can shadow each other. 

While these three properties are not equivalent in general they are apparently equiv- 
alent for certain types of expansive maps, where the definition of expansiveness is given 
below: 

Definitions: A homeomorphism g : M — > M is said to be expansive if there exists 
e(g) > such that 

d(g n (x),g n (y))<e(g) 

for n G Z if and only if x = y. 1 e(g) is called the expansive constant for g. Also, suppose 



^^Note that in general, if g is a function then we will write g" to mean the function g composed with 
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X is a metric space of homeomorphisms. Then a function g £ X is uniformly expansive 
in X if there exists a neighborhood V C X of g such that m/j e y(e(/)) > 0. 

We now state some properties relating pseudo-orbit shadowing, function shadowing, 
and structural stability. Many of these results are addressed by Walters in [62]. We refer 
the reader to [62] and fill in the gaps as necessary in Appendix A. 

Theorem 2.2.1 Let g : M — >■ M be a structurally stable diffeomorphism. Then g has 
the function shadowing property. 

Proof: This follows directly from the definitions of structural stability and function shad- 
owing. The conjugating homeomorphism, h, from the definition of structural stability 
provides a one-to-one connection between shadowing orbits of nearby maps. 

Theorem 2.2.2 (Walters) Let g : M — >■ M be a structurally stable diffeomorphism of 
dimension > 2. Then g has the pseudo-orbit shadowing property. 

Proof: This follows directly from Theorem 11 of [62]. The proof is not as simple as the 
previous theorem, since a pseudo-orbit of g is not necessarily a real orbit of a nearby 
map. However, Walters shows that given a pseudo-orbit of g, we can pick a (possibly) 
different pseudo-orbit of g that both shadows the original pseudo-orbit and is in fact a 
true orbit of a nearby map. Then structural stability can be invoked to to show that 
there must be a real orbit of g that shadows the original pseudo-orbit. 

Theorem 2.2.3 Let g : M — >■ M be an expansive diffeomorphism with the pseudo-orbit 
shadowing property. Suppose there exists a neighborhood, V C Diff 1 (M) of g that is 
uniformly expansive. Then g is structurally stable. 

Proof: This follows from discussions in [62]. See Appendix A. 

Theorem 2.2.4 ; Let g : M — >■ M be an expansive diffeomorphism with the function 
shadowing property. Suppose there exists a neighborhood, V C Diff 1 (M) of g such that 
V is uniformly expansive. Then g is structurally stable. 

Proof: This is similar to theorem 4 of [62]. See Appendix A. 

Summarizing our results relating various forms of shadowing and structural stability, 
we find that structural stability is the strongest condition considered. Structural sta- 
bility of a diffeomorphism of greater than one dimension implies both the pseudo-orbit 



itself n times. We assume that g° is the identity function. 
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shadowing and parameter shadowing properties for continuous families of mappings. 
Thus we can use the literature on structural stability to show that certain families of 
maps must have parameter shadowing properties, making it difficult to accurately esti- 
mate parameters given state data. As we shall see, however, most systems we are likely 
to encounter in physical applications are not structurally stable. 

Also, the pseudo-orbit shadowing property, parameter shadowing property, and struc- 
turally stability are equivalent for expansive diffeomorphisms g : M — > M of dimension 
greater than one if there exists a neighborhood of g in Diff 1 (M) that is uniformly 
expansive. However, again we shall see that most physical systems do not have this 
expansiveness property. Note also that these results do not apply to the maps of the 
interval which we consider in the next chapter. 



2.3 Absolute structural stability and parameter es- 
timation 

There is one more useful property we have not yet addressed. That is the concept of 
absolute structural stability. 

Lemma 2.3.1 Suppose that f p £ Diff 1 (M) for p £ I p C IR, and let f(x,p) = f p (x) 
for any x £ M. Suppose that f : M X I p — >■ M is C 1 and that f Po is an absolutely 
structurally stable diffeomorphism for some p £ I p . Then there exist e > and K > 
such that for every positive e < e , any orbit of f Po can be e— shadowed by an orbit of f p 
ifpe B(p ,Ke). 

Proof: This follows fairly directly from the definition of absolute structural stability. 
The conjugating homeomorphism provides the connection between shadowing orbits. 
See Appendix A for a complete explanation. 

Thus if an absolutely structurally stable mapping, g, is a member of a continuous 
parameterization of mappings, then nearby maps in parameter space can e-shadow any 
orbit of g. Furthermore, from above we see that the range of parameters that can shadow 
orbits of g varies at most linearly with e for sufficiently small e so that decreasing the 
measurement error will not result in any dramatic improvements in estimation accuracy. 
In these systems, it is clear that dynamics does not contribute a great deal to our ability 
to distinguish between the behavior of nearby systems. In the next section, we shall see 
that so-called uniformly hyperbolic systems can exhibit this absolute structural stability 
property, making them poor systems for accurate parameter estimation. 



23 



2.4 Uniformly hyperbolic systems 

Let us now turn turn our attention to identifying what types of systems exhibit the 
various shadowing and structural stability properties described in the previous section. 
Stability is intimately associated with hyperbolicity, so we begin by examining uniformly 
hyperbolic systems. 

Uniformly hyperbolic systems are interesting as the archetypes for complex behavior 
in nonlinear systems. Because of the definite structure available in such systems, it is gen- 
erally easier to prove results in this case than for more general situations. Unfortunately, 
from a practical viewpoint, very few physical systems actually exhibit the properties of 
uniform hyperbolicity. Nevertheless, understanding hyperbolicity is important as a hrst 
step to figuring out what is happening in more general situations. 

Our goal in this section is to state some stability results for hyperbolic systems, and 
to motivate the connections between hyperbolicity, stability, and parameter estimation. 
Most of the results in this section are well-known and have been written about in numer- 
ous sources. The material provided here outlines some of the properties of hyperbolic 
systems that pertain to our treatment of parameter estimation. The brief discussions 
use informal arguments in an attempt to motivate ideas rather than provide proofs. 
References to more rigorous proofs are given. For an overview of some of the material in 
this section, a few good sources include: Shub [55], Nitecki [43], Palis and de Melo [50], 
or Newhouse [42]. 

We hrst need to know what it means to be hyperbolic: 

Definitions: 

(f ) Given g : M — > M, A is a (uniformly) hyperbolic set of g if there exists a continuous 
invariant splitting of the tangent bundle, T X M = E x © E x for all x £ A and 
constants C > and A > f such that: 
(a)\Dg n v\ < C\~ n \v\ if v £ E* x ,n > 
(h)\Dg- n v\ < C\~ n \v\ ifveE2,n>0 

(2) A diffeomorphism g : M — > M is said to be Anosov if M is uniformly hyperbolic. 

One important property for understanding the behavior of hyperbolic systems are 
the existence of smooth uniformly contracting and expanding manifolds. 

Definition: We define the local stable, W^(x } g) } and unstable, W™(x,g), sets of g : 
M^Mas follows: 

W?(x,g) = {y £ M : d(g n (x) , g n (y)) < e for all n > } 
W?(x,g) = {y £ M : d(g~ n (x) , g~ n (y)) < e for all n > } 
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We define the global stable, W s (x,g), and unstable, W u (x,g), sets of g : M — > M as 
follows: 

W s (x,g) = {y e M : d(g n (x),g n (y)) ^ as n^ 00} 
W u (x,g) = {y e M : d(g~ n (x), g~ n (y))^0 as n^ 00} . 

The following result shows that these sets have definite structure. Based on this 
result, we replace the word "set" with the word "manifold" in the definitions above, so, 
for example, W s (x,g) and W u (x,g) are the stable and unstable manifolds of g at x. 

Theorem 2.4.1 (Stable/unstable manifold theorem for hyperbolic sets): Let g : M — >■ 
M be a C diffeomorphism (r > 1), and let A C M be a compact invariant hyperbolic 
set under g. Then for sufficiently small e > the following properties hold for x £ A; 

(1) W^(x } g) and W™(x,g) are local C r disks for any x £ A. W^(x } g) is tangent to E s x 
at x and W™(x,g) is tangent to E™ at x. 

(2) There exist constants C > and X > 1 such that: 

d(g n (x) } g n (y)) < CX~ n for all n > if y £ W?(x) 
d(g- n (x) } g- n (y)) < CX~ n for all n > if y £ W?(x). 

(3) W™(x) and W™(x) vary continuously with x. 

(4) We can choose an adaptive metric such that C = I in (2). 

Proof: See Nitecki [43] or Shub [55]. 

Note that from (2) above, we can see that our definitions for the global stable 
and unstable manifolds are natural extensions of the local manifolds. In particular, 
W e *(x,g) C W s (x } g) } W?(x,g) C W u (x), and: 

W s (x,g) = {Jg- n (W:(g n x)) 

n>0 

W u (x,g) = [j g n (W:(g- n x)). 



n>0 



Thus C r stable and unstable manifolds vary continuously, and intersect transversally on 
hyperbolic sets, meaning that the angle of intersection between the stable and unsta- 
ble manifolds is bounded away from zero on A. These manifolds create a foliation of 
uniformly contracting and expanding sets that provides for a definite structure of the 
space. We will now argue that uniformly hyperbolic systems obey shadowing properties 
and are structurally stable. 
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Lemma 2.4.1 (Shadowing Lemma): Let g : M — >■ M be a C r diffeomorphism (r > 1), 
and let A C M be a compact invariant hyperbolic set under g. Then there exists a 
neighborhood, U C M, of A such that g has the pseudo-orbit shadowing property on U. 
That is, given e > 0, there exists 8 > such that if {z n } is a 6 -pseudo-orbit of g } with 
z n G U for all n, then {z n } is e— shadowed by a real orbit, {x n } } of g such that x n G A 
for all integer n. 

Proof: Proofs for this result can be found in [7] and [55]. Here we sketch an informal 
argument similar to the one given by Conley [16] and Ornstein and Weiss [47] for the 
case where g is Anosov (ie, A = M is hyperbolic). 

Let {z n } be a <5-pseudo-orbit of g and let B n = B(z n} e). For the pseudo-orbit shad- 
owing property to be true, there must be a real orbit, {x n }, of g such that x n G B n for 
all integer n. Thus it is sufficient to show that for any e > there is a 8 > such that 
given any <5-pseudo-orbit of g, {z n } } there exists x G A satisfying: 

^oG f]g- n (B(z n ,e)). (2.1) 

nez 

Since the stable and unstable manifolds intersect transversally (at angles uniformly 
bounded away from zero), for any p G A, we can use the structure of the manifolds around 
p to define a local coordinate system for uniformly large neighborhoods, of p G A. 2 We 
can think of this as locally mapping the stable and unstable manifolds onto a patch of W. n 
such that stable and unstable manifolds lie parallel to the axes of a Cartesian grid (see 
figure 2.1). Also we can choose an adapted metric on A (specified in part (4) of the stable 
manifold theorem), for each p G A so that g has uniform local contraction/expansion 
rates. Using this metric on the transformed coordinates, we have a nice, neat model of 
local dynamical behavior, as we shall see below. From now on we deal exclusively with 
transformed local coordinates centered around z n and the adapted metric. Note that the 
discussion below and the pictures reflect the two-dimensional case (the idea is similar in 
higher dimensions). 

Now for all n pick squares, S(z n ,e) = S n , of uniformly bounded size centered at 
z n with S(z n ,e) C B(z n} e) such that the sides of S n are parallel to the axes of the 
transformed coordinate system around z n . The sides of the S n squares are hbered by 
stable and unstable manifolds, so when we apply g to S n , the square is stretched into 
a rectangle, expanding along the unstable direction, contracting in the stable direction. 
Meanwhile, the opposite is true for g~ x . Note that if we can show that there exists some 
x G A and e > such that: 

x G f| g- n (S(z n ,ej) 
nez 



2 The local coordinates we refer to here are known as canonical coordinates. For a more rigorous 
explanation of these coordinates refer to Smale [59] or Nitecki [43]. 
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Figure 2.2: For any e > we can choose 8 > so that for any hGZ, (a) any line segment, 
a^, along the unstable direction in S n gets mapped by g so that it intersects S n+ i, and (b) 
any line segment, a* , along the stable direction in S n gets mapped by g _1 so that it intersects 



27 



for any sequence, {z n } } that is <5-pseudo-orbit of g, then the shadowing property must 
be true. This is our goal. 

Let n G Z and let a u n be any line segment extending the length of a side of S(z n} e) 
parallel to the unstable direction inside S(z n ,e). Set a> u n+1 = g(a^) D S(z n+ i } e). Then, 
for any e > 0, we can choose a suitably small Si > 0, such that for any n, a^ +1 must be 
nonempty if {z n } } is a ^-pseudo orbit, of g (see figure 2.2). In figure 2.2 we see that 
Si > represents the possible offset between the centers of the rectangle, g(S n ), and 
the square, S n+ i. As e get smaller, the size of the rectangle and square gets smaller, but 
we can still choose a suitably small Si > so that g(a^) intersects S n+ i. Furthermore 
we can do exactly the same thing in the opposite direction. That is, let a s n be any line 
segment extending along the stable direction of S(z n} e), set a s n _ 1 = g~ 1 (a s n ) H S(z n _i } e), 
and choose S 2 > suitably small so that a^_ t must be nonempty for any n if {z n } } is a 
S 2 — pseudo orbit, of g. 

Given any e > set 8 = min{<5i, S 2 }. Then, for any n > 0, let a s n (n) be a segment 
in S n = S(z n ,e) parallel to the stable direction. Set a|_ 1 (n) = g~ 1 (a s k (n)) D Sk-i for 
any k < n. From our previous arguments we know that as long as {z n } is a 8— pseudo 
orbit of g, then a|_ 1 (n) must be a (nonempty) line in the stable direction within Sk-i 
if a s k (n) is a line in the stable direction of Sk- Consequently, by induction, a s Q (n) must 
be a line in the stable direction of Sq for any n > 0. Furthermore note that a s k (n) C Sk 
for any k G {0, 1, . . . ,n}. Doing a similar thing for n < 0, working with g instead of 
<7 _1 , and starting with a segment a™(n) parallel to the unstable direction of S n , we 
see that for any n < there exists a series of line segments, a%(n) C Sk, for each 
k G {n, n + 1, . . . , — 1,0} oriented in the unstable direction. Clearly a^—n) and a^n) 
must intersect for any n > 0. Now consider the limit of this process as n — > oo. It is easy 
to show that the intersection point 

Xq = ( lim an(n)) C]( lim a^(n)) 

must exist and must in fact be the x we seek satisfying (2.1). This initial condition can 
then be used to generate a suitable shadowing orbit, {x n }. 

Theorem 2.4.2 Anosov diffeomorphisms are structurally stable. 

Proof: Proofs for this result can be found in [4] and [37]. 

It is also possible to prove this result based on the shadowing lemma. The basic idea 
is to show that any Anosov diffeomorphism, g : M — » Af, is uniformly expansive, and 
then to apply theorem 2.2.3 to get structural stability. Walters does this in [62]. We 
outline the arguments. 

The fact that g is expansive is not too difficult to show. If this were not true, then 
there must exist x ^ y such that d(g n (x) , g n (y)) < e all integer n. But satisfying this 
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condition for both n > and n < would imply that y G W/(x,^) and j/ G W^(x } g), 
respectively. This cannot happen unless x = y. The contradiction shows that the Anosov 
diffeomorphism, g, must be expansive with expansive constant, e(g) > e, where e > is 
as specified in the stable manifold theorem. 

The next step is to observe that there exists a neighborhood, U, of g in Diff 1 (M) 
such that any / G U is Anosov. Then since the stable and unstable manifolds W^(x,f) 
and W™(x,f) vary continuously with respect to / G U ([28]), 3 we can show that there 
exists a neighborhood, U' C U, of g such that / G U' is uniformly expansive. Since g 
has the pseudo-orbit shadowing property, we can apply theorem 2.2.3 to conclude that 
Anosov diffeomorphisms must be structurally stable. This completes our explanation of 
theorem 2.4.2. 

Theorem 2.4.2, however, is not the most general statement we can make. We need a 
few more definitions, however, before we can proceed to final result in theorem 2.4.3. 

Definitions: 

(f ) A point x is nonwandering if for every neighborhood, U, of x, there exists arbitrarily 
large n such that f n (U) D U is nonempty. 

(2) A diffeomorphism / : M — > M satisfies Axiom A if: 

(a) the nonwandering set, 0(/) C Af, is hyperbolic. 

(b) the periodic points of / are dense in fi(/). 

(3) We say that / satisfies the strong transversality property if for every x G Af, 

E s © E u = TM. 

Theorem 2.4.3 (Franks) If f : M — » M is C 2 then f is absolutely structurally stable 
if and only if f satisfies Axiom A and the strong transversality property. 

Proof: See Franks [21]. 

Intuitively, this result seems to be similar to our discussion of Anosov systems, except 
that hyperbolicity is not available everywhere. However, there has been a great deal of 
research into questions concerning structural stability, especially whether structurally 
stable / G Diff 1 (M) implies that / satisfies Axiom A and the strong transversality 
property. The reader may refer to [55] for discussions and references to this work. 



3 Instead of hiding the details in this statement about stable and unstable manifolds, [62] gives a 
more direct argument (but one that requires math background which I have tried to avoid in the text). 
Let B(M, M) be the Banach manifold of all maps from M to M and let $; : B(M, M) -> B(M, M) so 
that $/(/i) = fhg~ l . If / = g, $ 3 (/i) has a hyperbolic fixed point near the identity function, id (where 
by hyperbolic we mean that the spectrum of the tangent map, T^$, is disjoint from the unit circle). 
Thus for any / £ U, $/(/*) has a hyperbolic fixed point near, id, and, since g is expansive, this shows 
uniform expansiveness for / £ U. 
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For our purposes, however, we now summarize the implications of theorem 2.4.3 to 
parameter estimation: 

Corollary 2.4.1 Suppose that f p £ Diff 1 (M) for p £ I p C IR, and let f(x,p) = f p (x) 
for any x £ M. Suppose also that f : M X I p — >■ M is C 1 and that for some p £ I p} 
f Po is a C 2 Axiom A diffeomorphism with the strong transversality property. Then there 
exists e > and K > such that for every positive e < e , any orbit of f Po can be 
e— shadowed by an orbit of f p if p £ B(p 0} Ke). 

In other words, C 2 Axiom A diffeomorphisms with the strong transversality satisfy 
a function shadowing property. They are stable in such a way that their dynamics does 
not magnify differences in parameter values. Chaotic behavior clearly does not lead to 
improved parameter estimates in this case. However, as noted earlier, most known phys- 
ical systems do not satisfy the rather stringent conditions of uniform hyperbolicity. In 
the next two chapters we will investigate results for some systems that are not uniformly 
hyperbolic, beginning with the simplest possible case: dynamics in one dimension. 
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Chapter 3 

Maps of the interval 



In the last chapter we examined systems that are uniformly hyperbolic. In this case, 
orbits of nearby systems have the same topological properties and shadow each other for 
arbitrarily long periods of time. We would now consider what happens for other types 
of systems. To start out with, we will investigate one-dimensional maps, specifically, 
maps of the interval. One-dimensional maps are useful because they are the simplest 
systems to analyze; yet as we shall see, even in one dimension there is a great variety of 
possible behavior, especially if one is interested in geometric relationships between the 
shadowing orbits of nearby systems. Such relationships are important in assessing the 
feasibility of parameter estimation, since they determine whether nearby systems can be 
distinguished from each other in parameter space. 

In section 3.1 we begin with a brief overview of what maps of the interval are struc- 
turally stable, and in section 3.2 we look at function shadowing properties of these maps. 
Our purpose here is not to classify maps into various properties. Although it is impor- 
tant to know what types of systems exhibit various shadowing properties, the main goal 
is to distill out some archetypal mechanisms that may be present in a number of inter- 
esting nonlinear systems. Especially of interest are any mechanisms that may help us 
understand what occurs in higher dimensional problems. 

In the process of investigating function shadowing, we will examine how the "fold- 
ing" behavior around turning points (i.e., relative maxima or minima) of one-dimensional 
maps governs how orbits shadow each other. This investigation will be extended in sec- 
tion 3.3, where we consider how folding behavior can often lead naturally to asymmetrical 
shadowing behavior in the parameter space of maps. This, at least, gives us some hint 
for why we see asymmetrical behavior in a wide variety of numerical experiments. As 
we will see in chapter 5, this asymmetrical shadowing behavior seems to be crucial in 
developing methods for estimating parameters, so it is important to try to understand 
where the behavior comes from. 
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In order to get definite results, we will restrict our claims to increasingly narrow 
classes of mappings. In section 3.4 we will apply our results to a specific example, 
namely the one-parameter family of maps we examined in chapter I: 

f p (x) = px(l - x). 

Finally, in section 3.6, we conclude with a number of conjectures and suggestions for 
further research into parameter dependence in one- dimensional maps. 



3.1 Structural stability 

We hrst want to examine what types of maps of the interval are structurally stable. These 
are not the types of maps we are particularly interested in for purposes of parameter 
estimation, but it is good to identify which maps they are. We briefly state some known 
results, most of which can be found in de Melo and van Strien [33]. 1 

Note that since interesting behavior for maps of the interval occurs only in non- 
invertible systems, we must slightly revise some of definitions of the previous section in 
order to account for this. In particular, instead of bi-inhnite orbits, we now deal only 
with forward orbits. These revisions apply, for example, in the definitions for various 
types of shadowing. Unless we mention a new definition explicitly, the changes are as 
one would expect. 

Let us, however, make the following new definitions, some of which may be a bit 
different from the analogous terms from chapter 2. In the definitions that follow (and 
this chapter in general) assume that 7 C IR is a compact interval of the real line. 

Definitions: Suppose that /:/—>■/ is continuous. Then the turning points of / are 
the local extrema of / in the interior I. C(f) is used to designate the set of all turning 
points of / on 7. Let C (7, 7) be the set of continuous maps on 7 such that /£C (7, 7) 
if the following two conditions hold: 
(a) / is C r (for r > 0) 

. (b) /(f) C 7. 
If in addition, we have that 

(c) f(Bd(I)) C Bd(I) (where Bd(I) denotes the boundary of 7), 
then we say that / £ C(7, 7). 

For either f,g £ C(7,7) or f,g £ C(7,7), then let d(f,g) = sup xeI \f(x) - g(x)\. 

Definitions: 

(f) / £ C(7, 7) is said to be C' r structurally stable if there exists a neighborhood U of 



1 [33] is the best source of material I have seen for results involving one-dimensional dynamics. 
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/ in C (7, 7) such that for every g £ U, there exists a homeomorphism h g : I — >■ 7 
such that gh g = h g f. 

(2) Let /:/—>■/. The cu-limit set of a point, x £ 7, is: 

u;(:r) = {j/ £ 7 : there exists a subsequence {n 8 } such that f n '(x) — > y 

for some x £ 7} 

B is said to be the basin of a hyperbolic periodic attractor if 7? = {x £ 7 : p £ u;(a;)} 
where p is a periodic point of / with period n and \Df n (p)\ < 1. 

(3) / £ C(7,7) is said to satisfy Axiom A if 

(a) / has a finite number of hyperbolic periodic attractors 

(b) Every x £ 7 is either a member of a (uniformly) hyperbolic set or is in the 
basin of a hyperbolic periodic attractor. 

The following theorem is the one-dimensional analog of theorem 2.4.3. 

Theorem 3.1.1 Suppose that f £ C(7,7) (r > 2) satisfies Axiom A and the following 
conditions: 

(1) If eel and Df(c) = 0, then c £ C(f). 

(2) f n (C(f))nC(f) = ®foralln>0. 

Then f is C 2 structurally stable. 

Proof: See for example, theorem III. 2. 5 in [33]. 

Axiom A maps are apparently prevalent in one-dimensional systems. For example, 
the following is believed to be true: 

Conjecture 3.1.1 The set of parameters for which f p = pxil — x) satisfies Axiom A 
forms a dense set in [0,4]. 

Proof: de Melo and van Strien [33] report that Swiatek has recently proved this result 
in [61]. 

Assuming that this result is true, we can paint an interesting picture for the param- 
eter space of f p = pxil — x). Apparently there are a dense set of parameter values for 
which fp = pxil — x) has a hyperbolic periodic attractor. The set of parameter values 
satisfying this property must be consist of a union of open sets, since we know that these 
systems are structurally stable. 
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On the other hand, this does not mean that all or almost all of the parameter space 
of f p = pxil — x) is taken up by structurally stable systems. In fact, as we shall see in 
section 3.4, a positive measure of the parameter space is actually taken up by systems 
that are not structurally stable. These are the parameter values that we will be most 
interested in. 



3.2 Function shadowing 

We now consider function and parameter shadowing. In section 2.2 we saw that for 
uniformly expansive diffeomorphisms, structural stability and function shadowing are 
equivalent. For more general systems, structural stability still implies function shadow- 
ing, however, the converse is not necessarily true. As we shall see, there are many cases 
where the connections between shadowing orbits of nearby systems cannot be described 
by a simple homeomorphism. The structure of these connections can in fact be quite 
complicated. 



3.2.1 A function shadowing theorem 

There have been several recent results concerning shadowing properties of one-dimensional 
maps. Among these include papers by Coven, Kan, and Yorke [17], Nusse and Yorke [39], 
and Chen [12]. This section extends the shadowing results of these papers in order to 
examine the possibility of parameter and function shadowing for parameterized families 
of maps of the interval. 

Specifically, we will deal with two types of maps: piecewise monotone mappings and 
uniformly piecewise-linear mappings of a compact interval, JcR onto itself: 

Definitions: A continuous map /:/—>■/ is said to be piecewise monotone if / has 
finitely many turning points. / is said to be a uniformly piecewise-linear mappings if it 
can be written in the form: 

f(x) = a, ± sx for x % G [c 8 _i, c 8 ] (3.1) 

where s > 1, c < c\ < . . . < c q and q > is an integer. (We assume s > 1 because 
otherwise there will not be any interesting behavior). 

Note that for this section, it is useful to define neighborhoods, B(x, e), so that they 
do not extend beyond the confines of I. In other words, let B(x, e) = (x — e, x + e) D I. 
With this in mind, we use the following definitions to describe some relevant properties 
of piecewise monotone maps. 

Definition: A piecewise monotone map, /:/—>■/, is said to be transitive if for any 
two open sets U, V C /, there exists an n > such that f n (U) D V ^ 0. 
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Definitions: Let /:/—>■/ be piecewise monotone. Then / satisfies the linking property 
if for every c £ C(/) and any e > there is a point z £ I and integer n > such that 
z £ 5(c, e), /"(.?) £ C(/), and |/ 8 (c) — / 8 (^)| < e for every i £ {1, 2, ... , n}. Suppose, in 
addition, that we can always choose a z ^ c such that the above condition is satisfied. 
Then / is said to satisfy the strong-linking condition. 

We are now ready to state the main result of this section. 

Theorem 3.2.1 ; Transitive piecewise monotone maps satisfy the function shadowing 
property in C°(/,J) if and only if they satisfy the strong linking property. 

Proof: The proof may be found in appendix B. 

In particular, this theorem implies the following parameter shadowing result. Let 
I p C IR be a closed interval of the real line. Suppose that {f p : I — > I\p £ I p } is a 
continuously parameterized family of one-dimensional maps, and let f Po be a transitive 
piecewise monotone mapping with the strong linking property. Then f p must have the 
parameter shadowing property at p = p . Note that f Po is certainly not structurally 
stable in C°(/,/). 2 The connections between the shadowing orbits are not continuous 
and one-to-one in general. In the next section we shall further examine what these 
connections are likely to look like. 

Now, however, we would like to present some motivation for why theorem 3.2.1 
makes sense. The key to examining the shadowing properties of transitive piecewise 
monotone maps is to understand the dynamics near the turning points. In regions away 
from the turning points, these maps look locally hyperbolic, so finite pieces of orbits 
in these regions shadow each other rather easily. The transitivity condition guarantees 
hyperbolicity away from the turning points, since any transitive piecewise monotone 
maps is topologically conjugate to a uniformly piecewise linear map. 

Close to the turning points, however, things are more interesting. Suppose, for 
example, that we are given a family of piecewise monotone maps f p : I — > /, and 
suppose that we would like to find parameter shadowing orbits for orbits of f Po that pass 
near a turning point, c, of f po . Consider a neighborhood, U C I around the turning point 
c. Regions of state space near c are folded on top of each other by f Po (see figure 3.1(a)). 
This can create problems for parameter shadowing. Consider what the images of U look 
like under repeated applications of f Po compared to what they might look like for two 
other parameter values (p_ and p + ) close to p (see figure 3.1(b)). Under the different 
parameter values, the forward images of U become offset from each other, since orbits 
for parameter values near p look like pseudo-orbits of f po . 



2 In fact, no map is structurally stable in C°(7, 1). This is clear, since any €9(1, 1) neighborhood of 
/ G C° (I, I) contains maps with arbitrary numbers of turning points. Since turning points are preserved 
by topological conjugacy, / cannot be structurally stable in C° (7, 1). 
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The forward images of U for different parameter values tend to consistently either 
lag or lead each other, a phenomenon which has interesting consequences for parameter 
shadowing. For example, in figure 3.1(b), since f p ^_(U) lags fp^(U), it appears that f p _ 
has a difficult time shadowing the orbit of f Po emanating from the turning point, c. On 
the other hand, from the same figure, there is no reason to expect that there are any 
orbits of f Po which are not shadowed by suitable orbits of f p+ . 

However, this is not the end of the story. If the linking condition is satisfied, then 
the turning points are recurrent and neighborhoods of turning points keep returning to 
turning points to get refolded on top of themselves. This allows the orbits of lagging 
parameter values to catch up as regions get folded back (see figure 3.1(c)). In this case, 
we see that the forward image of U under f Po gets folded back into the the corresponding 
forward image of U under f p _ } thus allowing orbits of f p _ to effectively shadow orbits 

o f fvo- 

On the other hand we see that there is an asymmetry in the shadowing behavior of 
parameter values depending on whether the folded regions around turning point lag or 
lead one another under the action of different parameter values. The parameter values 
that lag seem to have a more difficult time shadowing other orbits than the ones that lead. 
Making this statement more precise is the subject of the next section. Theorem 3.2.1 
merely states that if the strong linking condition is satisfied, then regions near turning 
points are refolded back upon one another in such a way that the parameter shadowing 
property is satisfied. 



3.2.2 An example: the tent map 

In [12], Chen proves the following theorem: 

Theorem 3.2.2 The pseudo-orbit shadowing property and the linking property are equiv- 
alent for transitive piecewise monotone maps. 

One interesting thing to note is the difference between function shadowing and 
pseudo-orbit shadowing. For instance, what happens when a transitive map exhibits 
the linking property but does not satisfy the strong-linking property? We already know 
that such maps must exhibit the pseudo-orbit shadowing property but must not satisfy 
the function shadowing property on C°(7, 7). It is worth a brief look at why this occurs. 

As an illustrative example, consider the family of tent maps, f p : [0,1] — > [0,1], 
where: 

f ( \ — j P x if x — 2 
Jp[X) ~ \p(l-x)ifx>± 

for p G [0, 2]. Pick p G (v2, 2) such that f^ (|) = |. It is not difficult to show that such a 
p exists. Numerically we find that one such value for p occurs near p ss 1.5128763969. 
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Figure 3.1: Figure 3.1(a) illustrates how neighborhoods near a turning point get folded, (b) 
shows what might happen for three different parameter values, p_ < p < p + . The images 
of neighborhoods near the critical point tend to get offset each from other so that the neigh- 
borhoods for certain parameters (eg., p + ) may begin to lead while other parameters (eg., p_) 
lag behind. Lagging parameters have difficulty shadowing leading parameters, (c) shows how 
neighborhoods can get refolded on each other as a result of a subsequent encounter with a 
turning point, allowing lagging parameters to "catch up," so that they are able to shadow 
parameter values that normally lead. 
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Case, C — 



We can see that f Po is transitive on the interval I(po) = [f£ (c), f Po (c)] where in this 
|. Given any interval, U C I(po), since p > a/2, if c G - [/then |/ Po (f)| > v2|f | 
and if c G £7 then |/ Po (f )| > ^ |£7|, where |£/"| denotes the length of the interval U. Thus 
either |/~ (£7)| > 2|£/"| or f po (U) = I(po), an d for any £/ C 7(j>o) there exists a k > such 
that /p (£/") = I(po). Consequently, / must be transitive on I. Note that even though 
I(p) is not invariant with respect to p } theorem 3.2.1 still applies, since we could easily 
rescale the coordinates to eliminate this problem. 



Now let p be near 1.5128763969 so that f 



) = c = |. We would like to investigate 
the shadowing properties of the orbit, {f Po (c)}kLo- Let f(x,p) = f p (x). Two important 
pieces of information are the following: 

D p f(c,po) = ^Mc,p )~ -1.2715534 



O-5{C,P0 



df 

dp 
-1 



(3.2) 
(3.3) 



where we define: 



(Ti\C. 



p) 



1 if c is a relative maximum of ft 

p. 

-1 if c is a relative minimum of ft 



As we shall see in the next section, statistics like (3.2) and (3.3) are important 
references in evaluating the shadowing behavior for families of maps. For this example, 
let us consider a combined state and parameter space and examine how a small square 
in this space around (x,p) = (c,p ) gets iterated by the map /. We see that because /^ 
has a relative minimum at c = | and because D p f 5 (c,p ) is negative, parameter values 
higher than p tend to lead while parameter values less than p tend to lag behind in the 
manner described earlier in this section. Since the turning point of f Po at c is periodic 
with period 5, this type of lead/lag behavior continues for arbitrarily many iterates. 

We want to know if nearby maps, f p , for p near p have orbits that shadow {f Po (c)}'jtL . 
Consider how the lead/lag behavior affects possible shadowing orbits. Because c = | is 
periodic, it is possible to verify that the quantity, [cr n (c } pQ )D p f n (c } pQ )], grows exponen- 
tially as n gets large (where p$ indicates that we evaluate the derivative for p arbitrarily 
close to, but less than p ). Thus for maps with parameter values p < p 0} all possible 
shadowing orbits diverge away from {f Po (c)}kLo a ^ a ra te that depends exponentially on 
the number of iterates. Consequently there exists a 8 > such that if p G (po — 8 } p ) } 
then no orbit of f p e— shadows {/^ (c)} ( jtL for any e > sufficiently small. On the other 
hand the orbit {f Po (c)}kLo can be shadowed by f p for parameter values p > p . In fact, 
because everything is linear, it is not difficult to show that there must exist a constant 
K > such that that for any e > 0, there is an orbit of f p that e— shadows {f Po (c)}kLo 
if p G [po,Po + Ke\. 

In summary, we see that the orbit, {/^ (c)}£l , cannot be shadowed by parameter 
values p < p 0} but can be shadowed for parameter values p > p . f Po satisfies the 
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linking but not the strong linking property. Thus f Po satisfies the pseudo-orbit shadowing 
property, and any orbit of f p for p near p can be shadowed by an orbit of f Po . On the 
other hand, f Po does not satisfy function or parameter shadowing properties, since not all 
nearby systems (for example, f p for p < p ) have orbits that shadow orbits of f Po . Also, 
note how the lead and lag behavior in parameter space results naturally in asymmetrical 
shadowing properties in parameter space. We will look at this more closely in the next 
section. 

As a final note and preview for the next section, consider briefly how the above ex- 
ample might generalize to other situations. The tent map example may be considered 
exceptional for two primary reasons: (1) the tent map is uniformly hyperbolic every- 
where except for at the turning point, and (2) the turning point of f Po is periodic. We 
are generally interested in more generic situations involving parameterized families of 
piecewise monotone maps, especially maps with positive Lyapunov exponents. Appar- 
ently a number of likely scenarios also result in lead/lag behavior in parameter space, 
producing asymmetries in shadowing behavior similar to that observed in the tent map 
example. However, this behavior generally gets distorted by local geometry. Also things 
become more complicated because of folding caused by close returns to turning points. 
In particular for maps with positive Lyapunov exponents, shadowing orbits for lagging 
parameter values tend to diverge away at exponential rates, just like in the tent map 
example, but this only occurs for a certain number of iterates until a close return or 
linking with a turning point occurs. In such cases, function shadowing properties may 
exist, but the geometry of the shadowing orbits still reflects the asymmetrical lead/lag 
behavior. This behavior certainly affects any attempts at parameter estimation. 



3.3 Asymmetrical shadowing 

In the previous two sections we were primarily interested in topologically-oriented re- 
sults about whether orbits of nearby one-dimensional systems shadow each other or not. 
However, topological results really do not provide enough information for us to draw any 
strong conclusions about the feasibility of estimation problems. Whether orbits shadow 
each other or not, in general we would also like to know the answers to more specific 
questions, for example: what is the expected rate of convergence for a parameter esti- 
mate, and how does the level of noise or measurement error affect the possible accuracy 
of a parameter estimate? 

In this section we address a more analytical treatment of the subject of shadowing 
and parameter dependence in one-dimensional maps. The problem with this, of course, 
is that there is an extremely rich variety of possible behavior in parameterized families 
of mappings, and it is difficult to say anything concrete without limiting the statements 
to relatively small classes of maps. Thus some compromises have to be made. However, 
we approach our investigation with some specific goals in mind. In particular we are 
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interested in definite bounds on how fast the closest shadowing trajectories in nearby 
systems diverge from each other and some explanation concerning how the observed 
asymmetrical shadowing behavior gets established in the parameter space. We will 
concentrate on smooth maps of the interval, especially the quadratic map, f p (x) = 
pxil — x). 

3.3.1 Lagging parameters 

In this subsection, we argue that asymmetries are likely to occur in parameter space. In 
particular, given a smooth piecewise monotone map with a positive Lyapunov exponent, 
shadowing orbits for nearby lagging maps tend to diverge away from orbits of the original 
system at an exponential rate before being folded back by close encounters with turning 
points. 

Preliminaries 

We will primarily restrict ourselves to maps with the following properties: 

(CO) </:/—)•/, is piecewise monotone. 

(CI) g is C 2 on /. 

(C2) Let C(g) be the finite set such that c £ C(g) if and only if g has a local extremum 
at c £ I. Then g"(c) ^ if c £ C(g) and g'(x) ^ for all x £ I \ C(g). 

We are also interested in maps that have positive Lyapunov exponents. In particular, 
we will examine maps satisfying a set of closely related properties known as the Collet - 
Eckmann conditions, (CE1) and (CE2). We will say that a map g satisfies (CE1) or 
(CE2), if there exist constants Ke > and Xe > 1 such that for some c £ C(g): 

(CE1) \Dg n (g(c))\>K E X n E , 

(CE2) \Dg n (z)\>K E X^iig n (z) = c. 

respectively for any n > 0. 

We also consider one-parameter families of mappings, f p :I x ^I x , parameterized by 
p £ I p , where ^CK and I p cK are closed intervals of the real line. Let f(x } p) = f p (x) 
where / : I x X I p — >■ I x . We are primarily interested in one-parameter families of maps 
with the following characteristics: 

(DO) For each p £ I p} f p : I x ^ I x satisfies (CO) and (CI). We also require that C(f p ) 
remains invariant with respect to p for all p £ I p . 

40 



(Dl) / : I x X I p — >■ I x is C 2 for all (x,p) £ I x X I p . 

Note that the following notation will be used to express derivatives of f(x,p) with respect 
to x and p. 

D x f(x,p) = %-(x,p) (3.4) 

Ox 

df 

D P f(x,p) = — (x,p). (3.5) 

The Collet-Eckmann conditions specify that derivatives with respect to the state, 
x, grows exponentially. Similarly we will also be interested in families of maps where 
derivatives with respect to the parameter, p, also grow exponentially. In other words, 
we require that there exist constants K v > 0, X p > 1, and N > such that for some 
p G I p , and c G C(f Po ): 

(CP1) \D P r(c, Po )\>KpX; 

for all n > N. From now on, given a parameterized family of maps, {f p \p G I p }, we will 
say that f Po satisfies (CP1) if the above condition holds. 

This may seem to be a rather strong constraint, but in practice it often follows 
whenever (CE1) holds. We can see this by expanding with the chain rule: 

D P P(c,p ) = D x f(r- 1 (c,p ),p )Dpf n - 1 (c,p ) + DJir-^c^po) (3.6) 

to obtain the formula for D p f n (x,po) : 

n— 2 n—1 

D P P(x,p ) = DJir-^c^po) + J2[D P f(f(c,p ),p ) J] D x f(f(c, Po ), Po )]. 

8=0 j=i+l 

Thus, if \D x f n (f(c,p ),p )\ grows exponentially, we expect \D p f n (x,p )\ to also grow 
exponentially unless the parameter dependence is degenerate in some way (eg, if f(x,p) 
is independent of p). 

Now for any c G C(f Po ), define cr n (c,p) recursively as follows: 

cr n+1 (c,p) = sgn{D x f(f n (c,p),p)}cr n (c,p) (3.7) 

where 

, , J 1 if c is a relative maximum of f p 

1 -lifcisa relative minimum of f p 

Basically cr n (c,p) = 1 if /I 1 has a relative maximum at c and cr n (c,p) = —1 if /I 1 has a 
relative minimum at c. We can use this notion to distinguish a one direction in parameter 
space from the other. 
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Definition: Let {f p : I x — >■ I x \p G I p } be a one-parameter family of mappings satisfying 
(DO) and (Dl). Suppose that there exists p G I v such that f Po satishes (CE1) and 
(CP1) for some c G C(f Po ). Then we say that the turning point, c, of f Po favors higher 
parameters if there exists N' > such that 

sgn{D p f n (c,p )} = (T n (c,p) (3.8) 

for all n > N'. Similarly, the turning point, c, of f Po favors lower parameters if 

sgn{D p f n (c,p )} = -a n {c,p) (3.9) 

for all n > TV'. 

The hrst thing to notice about these two definitions is that they are exhaustive if 
(CP1) is satisfied. That is, if (CP1) is satisfied for some p G I p and c G C(/ Po ), then 
the turning point, c, of f Po either favors higher parameters or favors lower parameters. 
We can see this from (3.6). Since \D p f(x } p )\ is bounded for x G I x , if \D p f n (x } p )\ 
grows large enough then its sign is dominated by the signs of D x f(f n ~ 1 (c } p ) } p ) and 
D p f n ~ 1 (c } p ) } so that either (3.8) or (3.9) must be satisfied. 

Finally, if p G I p and c G C(/ Po ), then for any e > 0, define n e (c, e, p ) to be the 
smallest integer n > 1 such that \f n (c } p ) — c*| < e for any c* G C(f Po ). We say that 
n e (c, e,po) = oo if no such n > 1 exists. 

Main result 

We are now ready to state main results of this subsection. 

Theorem 3.3.1 Let {f p : I x — )• I x \p G 7 P } &e a one-parameter family of mappings 
satisfying (DO) and (Dl). Suppose that (CP1) is satisfied for some p G I p and c G 
C(f Po ). Suppose further that f Po satisfies (CE1) at c, and that the turning point, c, favors 
higher parameters under f Po . Then there exists Sp > 0, A > 1, K' > 0, and K > 1, such 
that if p G (po — 6p } p ) } then for any e > 0, the orbit {/" (c)}^L «'s not e— shadowed by 
any orbit of f p if \p — p \ > K'e\~ ne( - c ' Re ' Po \ 

The analogous result also holds if f Po favors lower parameters. 

Proof: The proof of this result can be found in appendix C. 

The proof is actually relatively straightforward, although the details of the analysis 
becomes a bit tedious. The basic idea is that away from the turning points, everything is 
hyperbolic, and we can uniformly bound derivatives with respect to state and parameters 
to grow at an exponential rate. In particular, the lagging behavior for lower parameters 
is preserved and becomes exponentially more pronounced with increasing numbers of 
iterates. Shadowing orbits for parameters p < p diverge away exponentially fast if 
higher parameters are favored. However, this only works for orbits that don't return 
closely to the turning points where derivatives are small. 
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3.3.2 Leading parameters 

Motivation 

We have shown in the previous section that if / : I x X I p — > I x is a one parameter 
family of maps of the interval and if there exists N > such that 

D p f n (c,p )>a n (c,p )KX n (3.10) 

for all n > iV, then for p < p 0} orbits of f p tend to diverge at an exponential rate away 
from orbits of f Po that pass near the turning point, c. Such orbits of f Po can only be 
shadowed by orbits of f p for p < p if the orbits of f Po are folded back upon themselves 
by a subsequent encounter with the turning point. 

On the other hand, we would like to find a condition like (3.10) under which orbits 
of f p for p > p 0} can shadow any orbit of f Po indefinitely without relying on folding. 
This type of phenomenon is indicated by numerical experiments on a variety of systems. 
Unfortunately however, the derivative condition in (3.10) is local, so we have little con- 
trol over the long term behavior of orbits. Thus, we must replace this condition with 
something that acts over an interval in parameter space. 

For instance, we are interested in addressing systems like the family of quadratic 
maps: 

f(x,p)=px(l — x). (3-11) 

It is known that the family of quadratic maps in (3.11) satisfies a property known as 
the monotonicity of kneading invariants in the parameter space of f p . This condition 
is sufficient to make one direction in parameter space preferred over the other. We 
show in this subsection that monotonicity of kneading invariant along with (CE1) is 
sufficient to guarantee strong shadowing effects for parameters that lead, at least in 
the case of unimodal (one turning point) maps with negative Schwarzian derivative, a 
class of maps that include (3.11). Maps with negative Schwarzian derivative have been 
the focal point of considerable research over the last several years, since they represent 
some of the simplest smooth maps which have interesting dynamical properties. We 
take advantage of analytical tools developed recently in order to analyze the relevant 
shadowing properties. 

Definitions and statement of results 

Definition: Suppose that g : I —>■ I is C 3 and /C t. Then the Schwarzian derivative, 
Sg, of g is given by the following: 



Sg(x) 



g'"(x) 3./(x), 



g'(x) 2 g'(x) 
where g'(x) } g"(x) } g l "(x) here indicate the first, second, and third derivatives of x. 
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In this section we will primarily restrict ourselves to mappings with the following 
properties: 

(AO) g : 7 -> 7, is C 3 (7) where 7 = [0, 1], with g(0) = and g(l) = 0. 

(Al) g has one local maximum at x = c; g is strictly increasing on [0,c] and strictly 
decreasing on [c, 1]; 

(A2) </"( C )<0, kj'(0)| >1. 

(A3) The Schwarzian derivative of g is negative, Sg(x) < 0, over all x £ 7 (we allow 

Sg(x) = — oo). 

Again we will be investigating one-parameter families of mappings, / : I x X I p — > I X} 
where p is the parameter and I X ,I P C IR are closed intervals. Let f p (x) = f(x } p) where 
fp'-Ix^Ix- We are primarily be interested in one-parameter families of maps with the 
following characteristics: 

(BO) For each p £ I p , f p : I x -> I x satishes (AO), (Al), (A2), and (A3) where I x = [0, 1]. 
For each p } we also require that f p has a turning point at c, where c is constant 
with respect to p. 

(Bl) / : I x x I p — >■ I x is C 2 for all (x,p) £ I x x 7 p . 

Another concept we shall need is that of the kneading invariant. Kneading invariants 
and many associated topics are discussed in Milnor and Thurston [34]. 

Definition: If g : 7 — > I is a piecewise monotone map with exactly one turning point 
at c, then the kneading invariant, D(g,t), of g is defined as follows: 

D(g, t) = l + e^t + 9 2 (g)t + ... + 9 n (g)t n + ... 



where 

Qn(g) = ti(g)t2(g) • • • £ n (g) 

e n (g) = lim i sgn(Dg(g n (x))) 

for n > 1. If c is a relative maximum of g, then one interpretation of 6 n (g) is that it 
represents whether g n+1 has a relative maximum (0 n (g) = +1) or minimum (0 n (g) = — 1) 
at c. 

We can also order these kneading invariants in the following way. We will say that 
\D(g,t)\ < \D(h,t)\ if 9 l {g) = 9 l {h), for 1 < i < n, but 8 n (g) < 6 n (h). A kneading 
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invariant, D(f p ,t), is said to be monotonically decreasing with respect to p if pi > p 
implies \D(f Pl ,t)\ < \D(f Po ,t)\. 

We are now ready to state the main result of this subsection: 

Theorem 3.3.2 Let {f p : I x — > I x \p £ I p } be a one-parameter family of mappings 
satisfying (BO) and (Bl). Suppose that p £ I p such that f Po satisfies (CE1). Also, 
suppose that the kneading invariant, D(f p ,t), is monotonically decreasing with respect 
to p in some neighborhood of p = p . Then there exists Sp > and C > such that for 
every x £ I x there is a set, W(x ) C I x X I p , satisfying the following conditions: 

(1) W(x ) = {(a X0 (t)J X0 (t))\t £ [0,1]} where a X0 : [0,1] -> I x and {3 Xo : [0,1] -> I p are 
continuous and f3 Xo (t) is monotonically increasing with respect to t with f3 Xo (0) = po 
and f3 X0 (l) = p + Sp. 

(2) For any x £ I x , if(x,p) £ V^(x ) then \f n (x,p) — f n (x ,po)\ < C(p — po)~ for all 
n > 0. 



Proof: See appendix D 

Corollary 3.3.1 Let {f p : I x — > I x \p £ I p } be a one-parameter family of mappings 
satisfying (BO) and (Bl). Suppose that p £ I p such that f Po satisfies (CE1). Also, 
suppose that the kneading invariant, D(f P7 t) } is monotonically decreasing with respect 
to p in some neighborhood of p = p . Then there exists Sp > and C > such that if 
p £ [po 7 Po + Sp] } then for any e > 0, every orbit of f Po is e-shadowed by an orbit of f p if 
\p-po\ < Ce 3 . 

Proof: This is an immediate consequence of theorem 3.3.2. 

Overview of proof 

We now outline some of the ideas behind the proof of theorem 3.3.2. The proof 
depends on an examination of the structure of the preimages of the turning point, x = c, 
in the combined space of state and parameters (I x X I p space). The basic idea is to find 
connected shadowing sets in state-parameter space. These sets have the property that 
points in the set shadow each other under arbitrarily many applications of /. Certain 
geometrical properties of these sets can be determined by squeezing the sets between 
structures of preimage points. In order to discuss the approach further, we hrst need to 
introduce some notation. 

We consider the set of preimages, P(n) C I x X I p satisfying: 

P(n) = {(x } p)\f t (x } p) = c for some < i < n}. 
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It is also useful to have a way of specifying a particular section of path-connected preim- 
ages, i?(n, x 0} p ) C P(n), extending from a single point, (x 0} p ) £ P(n). Let us define 
R(n } x 0} p ) so that (x',p') £ R(n } x 0} p ) if and only if (x',p') £ P(n) and there exists a 
continuous function, g : I p — >■ I X} such that </(j>o) = x 0} g(p') = x', and 

{(x,p)\x = g(p),p £ [po;p']} C P(n), 

where [po;p'] may denote either [po 5 £>'] or [p' 5 Po] 5 whichever is appropriate. 

The hrst step is to investigate the basic structure of P(n). We show that P(n) 
contains no regions or interior points and that P(n) cannot contain any isolated points 
or curve segments. Instead, each point in P(n) must be part of a continuous curve 
that stretches for the length of the parameter space, I p . In fact, if (x 0} y ) £ P(n), then 
R(n } x 0} p ) n (I x X {sup/p}) ^ and R(n } x 0} p ) (I x X {inf I p }) ^ 0. 

The next step is to demonstrate that if the kneading invariant of f p} D(f p ,t), is 
monotonically decreasing (or increasing), then P(n) has a special topology. It must 
take on a tree-like structure so that as we travel along one direction in parameter space, 
branches of P(n) must either always merge or always split away from each other. For 
example if D(f P7 t) is monotonically decreasing, then branches of P(n) can only split 
away from each other as we increase the parameter p. In other words, i?(n,j/_,p ) and 
R(n } y +} p ) do not intersect each other in the space, I x X {p} } for for p > p if y + ^ y_ 
and y+,y- £ I x . 

Now suppose we want to examine the points that shadow (x 0} p ) under the action 
of / given any x £ I x . We hrst develop bounds on derivatives for differentiable sections 
of R(n } x } p ). We then use knowledge about the behavior of R(n } x } p ) to bound the 
behavior of the shadowing points. We demonstrate that for maps, f p} with kneading 
invariants that decrease monotonically in parameter space, there exist constants C > 
and Sp > such that if x £ I x and 

U{p) = {x\\x-x \<C{p-p ) 1 -} (3.12) 

for any p £ I p} then for any p' £ [po 7 po + Sp], there exists x' + £ U(p') such that 
(x' +} p') £ R(n +} y +} p ) for some y + > x and n + > assuming that f n+ (y +} p ) = c. 
Likewise there exists x' + £ U(p') such that (x'_ } p') £ i?(n_, j/_,p ) for some y_ < x and 
n_ > where f n ~(y_ } p ) = c. 

However, setting n = max{n + , n_}, since i?(n,j/_,p ) and R(n } y +} p ) do not in- 
tersect each other for p > p and y_ ^ j/ + , then we also know that for any y_ < J/+, 
there is a region in I x X I p space bounded by i?(n, j/_,p ), R(n } y +} p ) } and p > p . 
Take the limit of this region as y_ — > Xq , y + —■ x^ , and n — > oo. Call the resulting 
region S(x ). We observe that S(x ) is a connected set that is invariant under / and 
is nonempty for every parameter value p £ I p such that p > p (by invariant we mean 
that f(S(x )) = S(f(x 0} p )). Thus, since S(x ) is bounded by (3.12), there exists a set 
of points, S(x ) } in combined state and parameter space that shadow any trajectory, 
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{fp ( x o)}^Lo °f fpo- Finally we observe that there exists a subset of S(x ) that can be 
represented by the form given for W(x ). 



3.4 Example: quadratic map 

In this section we examine how the results of section 3.3 apply to the quadratic map, 
f p : [0,1] ^[0,1], where: 

fp(x) = px(l — x) (3.13) 

and p G [0,4]. For the rest of this section, f p will refer to the map given in (3.13), and 
f(x,p) = f p (x) for any (x,p) G I x X I p where I x = [0, 1] and I p = [0,4]. 

We have already seen in conjecture 3.1.1, that there appears to be dense set of pa- 
rameters in I p for which f p is structurally stable and has a hyperbolic periodic attractor. 
However, by the following result, we find that there is also a large set of parameters for 
which f p satisfies the Collet- Eckmann conditions and is not structurally stable: 

Theorem 3.4.1 Let E be the set of parameter values, p } such that (CE1) is satisfied for 
the family of quadratic maps, f p} given in (3.13). Then E is a set of positive Lebesgue 
measure. Specifically, E has a density point at p = 4 so that: 



]im HE I1 i^ ± 31 = 1 



lim-^ - -^ = 1. (3.14) 



e^O 



where X(S) represents the Lebesgue measure of the set S. 

Proof: The hrst proof of this result was given in [5]. The reader should also consult 
the proof given in [33]. 3 

Apparently, if we pick a parameter, p 0} at random from I p (with uniform distribution 
on I p ) there is a positive probability that f Po will satisfy (CE1). We might note that 
numerical evidence suggests that the set of parameters, p } resulting in maps, f p} which 
satisfy (CE1) are not just concentrated in a small neighborhood of p = 4. 

In any case, applying the results of the last section, we see that for a positive measure 
of parameter values, there is a definite asymmetry with respect to shadowing results in 
parameter space. The following theorem illustrates this fact. 



3 These two references actually deal with the family of maps, g a (x) = 1 — ax 2 , where a is the 
parameter. However, the maps g a and f p are topologically conjugate if a = p 2 — 2p. The conjugating 
homeomorphism in this case is simply a linear function. Thus the results in the references immediately 
apply to the family of quadratic maps, f p : I x — ► I x for p £ I p . 
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Theorem 3.4.2 Let I p = [0,4], I x = [0,1], and f p : I x — >■ I x be the family of quadratic 
maps such that f p (x) = pxil — x) for p G I p . Then there exist constants 6 > 0, C > 0, 
K > 0, and set £"(7) C I p with positive Lebesgue measure for every 7 > 1 such that: 

(1) 1/7 > 1 and po G £'(7), then f Po satisfies (CE1). 

(2) If f Po satisfies (CE1), then for any e > sufficiently small, any orbit of f Po can be 
e— shadowed by an orbit of f p for p G [po 7 Po + Ce 3 ]. 

(3) If 7 > 1 and p G £(7), £/iera for any e > ; almost no orbits of f Po can be 
e— shadowed by any orbit of f p for p G (po — S } po — (Ii 'e) 7 ). T7ia£ «s, the set of possible 
initial conditions, x G I x , such that the orbit {f Po (xo)}fl can be e— shadowed 
by some orbit of f p comprises at most a set of Lebesgue measure zero on I x if 
p G (po - S,p - {Ke) 1 ). 

Proof of theorem 3. 4. 2: The full proof for this result can be found in appendix E. 

Before we take a look at an overview of the proof for theorem 3.4.2, it is useful to 
make a few remarks. First of all, one might wonder whether the asymmetrical situation 
in theorem 3.4.2 is really generic for all p G I p such that f Po satisfies (CE1). For 
example, are there other parameter values in I p for which it is easier to shadow lower 
parameter values than it is to shadow higher parameter values? Numerical evidence 
indicates that most if not all p G I p exhibit asymmetrical shadowing properties if f p has 
positive Lyapunov exponents. Furthermore, it seems that these parameter values favor 
the same specific direction in parameter space. In fact it is easy to show analytically 
that condition (2) of theorem 3.4.2 actually holds for all p G I p for which f Po satisfies 
(CE1). In other words, for f Po satisfying (CE1), there exists C > such that for any 
e > sufficiently small, f Po can be e— shadowed by an orbit of f p if p G [po 7 Po + Ce 3 ]. 

We now outline the strategy for the proof of theorem 3.4.2. For parts (1) and 
(3) we basically want to combine theorem 3.3.1 and theorem 3.4.1 in the appropriate 
way. There are four major steps. We hrst bound the return time of the orbit of the 
turning point, c = |, to neighborhoods of c. Next we show that f p satisfies (CP1) and 
favors higher parameters on a positive measure of parameter values. This allows us to 
apply theorem 3.3.1. Finally we show that almost every orbit of these maps approach 
arbitrarily close to c so that if the orbit, {f Po (c)}fl 0} cannot be shadowed then almost 
all other orbits of f Po cannot be shadowed either. 

We bound the return time of the orbit of the turning point, c, to neighborhoods of c by 
examining the proof of theorem 3.4.1. Specifically, as part of the proof of theorem 3.4.1, 
Benedicks and Carleson [5] show that for any a > 0, there is a set of positive measure 
in parameter space, S(a) G I p , such that if p G S(a) then f Po satisfies (CE1) and the 
condition: 

\fjc)-c\>e- (3.15) 
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for all i G {0, 1,2,...}. The set, S(a), has a density point at parameter value p = 4. 

Next we show that f p satisfies (CP1) and favors higher parameters on a subset of S(a) 
of positive measure. This is basically done by looking at what happens for p = 4 and 
extrapolating that result for parameters in a small interval in parameter space around 
p = 4. The result only works for those values of p for which f p satisfies (CE1). However, 
since p = 4 is a density point of S(a), for any a > 0, there is a set, S*(a), contained in 
a neighborhood p = 4 with a density at p = 4 for which p G S*(a) implies f Po satisfies 
(CE1) and (3.15), and f p favors higher parameters and satisfies (CP1) at p = p . 

Then by applying theorem 3.3.1 we see that there exist constants 8 > 0, K > and 
K\ > such that for any a > 0, if po G S*(a) then the orbit, {/' (c)}°^ , cannot be 
shadowed by any orbit of f p for p G (po — 8, po — K e\~ ne ( c,Kie,Po >) (recall that n e (c, e,po) 
is defined to be the smallest integer n > 1 such that \f n (c } p ) — c\ < e.) By controlling 
a > in (3.15) we can effectively control n e (c, e,p ) to be whatever we want. Thus 
for any 7 > we can choose a set £"(7) C I p with a density point at p = 4 such 
that if p G E(~f) then f Po satisfies (CE1) and no orbits of f p e— shadow the orbit, 
{i» ( c )}i==0 5 f° r an y P ^ (P° ~ ^P° ~ -^o(-^ie) 7 ). But since 7 > 1, if we set constant 
K = max{A^ A^i, K\] > we see that p — K (Kie)' y > p — (Ke) 1 for any e > 0. Thus, 
no orbits of f p may e— shadow {/p (c)}°^ , if p G (po — 8 } p — (Key). 

Finally it is known that if f Po satisfies (CE1) then almost every orbit of f Po approaches 
arbitrarily close to c. Thus for almost all x G I x , the orbit, {f Po (xo)}fl 0} cannot be 
shadowed by an orbit of f p if the orbit, {/' (c)}^ , cannot be shadowed by any orbit 
of f p . Consequently, we see that for any 7 > 1 if p G £"(7) then f Po satisfies (CE1) and 
almost no orbits of f Po can be shadowed by any orbit of f p if p G (po — 8 } p — (Key). 
This would prove parts (1) and (3) of the theorem. 

Part (2) of theorem 3.4.2 is a direct result of corollary 3.3.1 and the following result, 
due to Milnor and Thurston [34]: 

Lemma 3.4.1 The kneading invariant, D(f P7 t) } is monotonically decreasing with re- 
spect to p for all p G I v . 

Thus if f Po satisfies (CE1) for some p G £(7), there exists a constant C > such that 
any orbit of f Po can be e— shadowed by an orbit of f p if p G [po 7 Po + Ce 3 ]. This proves 
part (2) of the theorem. 



3.5 Remarks on convergence of parameter estimates 

In order to determine the feasibility of parameter estimation applications, it is important 
to have some idea about how many state samples are likely to be needed in order to attain 
a certain accuracy in the parameter estimate. Ergodic theory comes into play here, since 
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we would like to consider the behavior of typical orbits. In particular, suppose that a data 
stream is generated from an initial condition that is chosen at random after the system 
has settled into its equilibrium behavior. We would like to estimate the rate at which a 
parameter estimate is likely to converge with increasing numbers of measurements from 
the data stream. In this section, we outline ideas on how to approach this question, and 
make certain conjectures about convergence results. These conjectures closely match 
numerical results attained from actual parameter estimation techniques as shown in 
chapter 6 of this report. 

We have already seen that the accuracy of a parameter estimate for a piecewise 
monotone map depends on how close the orbit being sampled comes to the turning 
points of the map. When an orbit comes close to a turning point, nearby regions in 
state space are subject to a folding effect that enables us to distinguish small differences 
in parameters based on state data. With a given level of measurement noise, e, there 
often exists a lower limit on the parameter estimation accuracy resulting from folding 
and refolding effects near turning points (see theorem 3.2.1). This bound is related to 
the amount of time it takes for an orbit near a turning point to return within e distance 
of a turning point. For most numerical purposes, however, this lower limit is often too 
small to be of practical importance. Thus, it is important to consider the approximate 
rate at which a parameter estimate is likely to converge, before the system reaches the 
lower limit in the accuracy of the parameter estimate. 

Assuming that a family of piecewise monotone maps, {f p \p G I v C IR} has the same 
number of turning points for all p£ J p , this turns out to be equivalent to asking the 
following question: Given a typical orbit, {x n }^ =1 , of f Po (with p G I p ), as N increases, 
for what parameter values, p, do there exist shadowing orbits, {y n (p)}n=n °f fpi such 
that y n (p) and x n lie on the same monotone branch of f Po for each n G {1,2..., N}. In 
other words, if c\ < c 2 < . . . < c m are the turning points of f p for all p G I p , then for 
any n G {1, 2 . . . , iV}, we require that x n G [c 8 -, c 8 +i] implies y n (p) G [c 8 -, c 8 +i]. This makes 
sense because the lower limit in the accuracy of the parameter estimate results from 
the fact that orbits can shadow each other by evolving on different monotone branches, 
so that state space regions around an orbit for a map with leading parameters get 
refolded more than regions around shadowing orbits for maps with lagging parameters. 
Henceforth, given the family, f p} of piecewise monotone maps described above, we will 
say that a sequence of points, {y n }n=n ^-monotone-shadows an orbit {x n }^ =1 , of f Po 
if y n and x n lie on the same monotone branch of f Po for each n G {1,2..., N} and if 
\]Jn — x n \ < e for each n G {1,2..., N}. 

Using these ideas, we make the following conjectures: 

Conjecture 1: Consider the family of tent maps, {g p : I x — > I x \p G I p } } where I x = 
[0,1], 



9p( 



px if x <\ 



2 



p{\ — x) if ' x > \ 
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and I p = (|,2]. Given e > and p G I p , for almost all x G I x there is a constant 
K > such that for each positive integer N, there exists ap£ (p — Kjj,po] such that 
the orbit {g™ (xo)}^ =0 is not e-monotone-shadowed by any orbit of g p . 

It turns out that numerical results indicate that the error in the estimate of the 
parameters of the tent map tends to converge at a rate proportional to -^ where N is 
the number of observations. Similarly we have: 

Conjecture 2: Consider the family of quadratic maps, {f p : I x — > I x \p G /'}, where 

4 = [0,1], 

fp(x) = px(l — x) (3.16) 

and I 1 = (2,4]. Then there exists a set E <Z I' of positive Lebesgue measure such that if 
p G E, then given e > 0, for almost all x G I x , there is a constant K > such that for 
each positive integer N, there exists ap£ (p — K-^2,po] such that the orbit {gp (xo)}^ =0 
is not e-monotone-shadowed by any orbit of f p . 

Furthermore, we expect that the error in the parameter estimate of the quadratic 
map should converge at a rate proportional to -^2, where N is the number of observations 
processed. In chapter 6, we will see that this appears to agree with numerical results. 

The rest of this section will be devoted to motivating these two conjectures. In order 
to estimate the convergence rate of the parameter estimate, we hrst need an estimate of 
how fast an orbit is likely to approach a turning point. It turns out that the maps we 
are interested in are ergodic so that the long term average behavior of almost all orbits 
of the maps can be described by the appropriate invariant measure of the map. Thus, 
in order to estimate how fast most orbits approach a turning point of map, it is helpful 
to examine the invariant measures of the map. 

fi is said to be an invariant measure of the map h : I x — > I x if fi(h~ 1 (A)) = fi(A) 
for any open set A C I x . Every ergodic map, h : I x — » I X} has an associated invariant 
measure, //, such that for any continuous function <f> : I x — > IR, the relation, 

JV-l 



1 f 

! im ^7 Z) ^(f n ( x °)) = / <f>(x)[i(da 

\t-kx> 1\ ^_ n J x ei 



N ■— ' .=0 



the "space- average" of an ergodic map. The density, ^, of the measure // satisfies the 
ceA d\ [ " " "" 



holds for //-almost all x G I x . Thus, one might say that the "time- average" equals 

dji 
d\- 

property that f xeA -jj(x) ^(dx) = fJ.(A) for any open A C I x . 4 

Conjecture 1: 

Let us now outline the motivation behind Conjecture 1. The tent map, g p} is ergodic 
Up G I p . The density, -j^-, of the associated invariant measure fj, p of g p is simply a 



4 For more information regarding invariant measures and ergodic theory of maps of the interval please 
refer to chapter V in [33]. 
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constant over the region, [g^(c) } g p (c)]. We expect that for // p - almost all initial conditions 
Xq G [0, 1] there exists aif>0 such that: 





1 


mm \g l p (x ) - 


- c < A - 


n 



is satished for all n > if p G I v . 

Keeping this observation in mind, let c = | be the critical point of the tent map. 
One can show that g p favors higher parameters for all p G I v . In other words, using the 
same notation as in (3.7) and (3.8), we know that 

sgn{D p g n (c,p)} = a n (c,p) (3.17) 

for all n > l. 5 It is also not difficult to show that there exist constants K\ > and 
A 2 > such that 

K lP n < \D p g n (c,p)\ < K 2 p n . (3.18) 

for all n > 1 if p G I p . 

Now given p G I p and an initial condition x G [0, 1], consider the finite orbit 
{g Po (xo)} n=0 . We would like to determine if there is an orbit of g p that e— monotone- 
shadows this orbit for p < p . To hrst order, this is basically determined by the magnitude 
of 

A n = gf N \9p ( x o) ~ c\ (3.19) 

because regions of state space near the critical point c get folded, producing the leading 
and lagging behavior which in turn leads to asymmetrical shadowing in parameter space. 
Since the tent map favors higher parameters, g p cannot e— monotone-shadow the orbit 
{g; o (c + A N )}™ =1 for p<p if: 

a n (c,p)[g; o (c + A N )-g;(c)] > e. (3.20) 

for any n < m. Suppose that the inequality in (3.20) is false for all n < m — 1. Then, 
from (3.17) and (3.18), 

a m (c,p)[g™(c + A N ) - g™(c)} > K lP m (p - p) - p™A N . 

Now suppose that 

m = lo S,o IT-- ( 3 - 21 ) 
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Actually g p favors higher parameters for all p G (1, 2]. We confine our discussion here to p G I p 



(|, 2] for convenience since (3.17) may only hold for n > No for some No > 1 if p G (1, §]• However, we 
suspect that Conjecture 2 also holds for any p G (1, 2]. 
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We find that: 



<T m {c,p)\g™{c + ± N )-g?{c)\ > e [(^)(^)i^" 1 - 1] 



So for Ajv sufficiently small and p G I p , we can see that the inequality in (3.20) holds 
for the value m given in (3.21) if po — p > 3Ajy- 

Thus, given sufficiently small Ajv as defined in (3.19), there exists ap£ (p — 3Ajv,_Po] 
such that no orbit of g p can e— monotone-shadow the orbit {x n ~} n =™ . Recall, however, 
that Ajv should decrease at rate at least proportional to -^. Applying this fact, we get 
a result similar to Conjecture 1. 

Conjecture 2: 

Now let us consider Conjecture 2. The basic idea here is similar to Conjecture 
1. However, Conjecture 2 presents some additional complications. First the invariant 
measures for the quadratic map are more complicated, and cannot be written in closed 
form. Second, there is no uniform expansion available in state or parameter space, so 
that it is not a simple matter to bound the quantity /I 1 (c + Ajv) — fp(c) for small Ajv 
and for p near p . 

The invariant measures of the quadratic map have been the subject of vigorous 
research over the past several years. Nowicki and van Strien show in [46] that for the 
maps given in (3.16) if f Po satisfies (CE1) for some p G (1,2], then f Po has an ergodic 
invariant measure // such that for any measurable set A C [0, 1] there exists a constant 
K > such that fi(A) < A^|A|2 (where \A\ is the Lebesgue of the set A). 



Now consider the interval A F = (c— |e, c + |e). Note that there exists K' > such 
that for any e > 0, |/(A e )| < K'e 2 . Thus, from Nowicki and van Strien's result, we know 
that there exists K\ > such that for any e > 0: 

fi(A £ ) = fi(f P0 (Ae)) < K\f P0 (A £ )\? < K x e 

Furthermore, it is fairly easy to show that there also exists K 2 > such that for any 
e > fi(A e ) > K 2 e. Thus, since K 2 e < fi(A e ) < K\t for any e, we expect that for almost 
all initial conditions x G [0, 1], the quantity, 

A N (x )= min L£(a„)- C |. (3-22) 

0<n<l\ 

will decay at a rate proportional to -^. 

As in Conjecture 1, given p G I p , x G I x , and the finite orbit, {fp (xo)}n =0 , of 
f Po} we would like to determine if there is an orbit of f p that e— monotone-shadows this 
orbit for some p < p . As before, the important statistic to know is Ajv(^o) (we will 
henceforth assume that x is fixed and refer simply to Ajv). 
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As in Conjecture 1, f p cannot e— shadow the orbit {/I 1 (c + Ajv)}™ =1 for p < p if: 

a i (c,p)[r po (c + A N )-r p (c)]>t. (3.23) 

for any i < m. This corresponds to what happens when an orbit, {f Po (c + Ajv)}^ =0 ? 
of f Po leads the orbit of the critical point of the map f p by more than e so there is 
no orbit of f p that can effectively shadow that orbit. The other way that f p can fail 
to e— monotone-shadow the orbit {/I 1 (c + Ajv)}^ =0 , is if /'(c) lags behind /' (c + Ajv) 
(by less than e), but /'(c) and /I (c + Ajv) are on different monotone branches (ie, the 
critical point, c = | is between /'(c) and f po (c + Ajv))- 

Thus, the prove the conjecture it is sufficient to show that given e > sufficiently 
small, there exists a constant K > such that that for each Ajv > there exists a 
P > Po — KA 2 N and i < — C!ogA 2 N such that one of the following is satisfied: 

(1) a l (c,p)[f; o (c + A N )-f;(c)}>e 

(2) a t (c,p)[f; o (c + A N )-f;(c)]>0 and sgn{c - /;(c + A N )} = -sgn{c - /;(c)}. 

The problem is getting a estimate for fl (c + Ajv) — fl{c). Recall that near p = 4 there 
is a set E C (2,4] of positive Lebesgue measure such that for each p £ E, f Po satisfies 
(CE1), (CP1), and favors higher parameters. Thus if p £ E, there exists a K > and 
N > such that 

1 Dpffopo) 

K- < D x f-Hf(c,p ),p ) < A °- (3 - 24) 

for all i > A^o- So, if p £ £", for p < p and each i > N we have that: 
a t (c,p)[rjc + A N )-f;(c)] 

= a t (c,p)[(f P0 (c) - /;(c)) - (f P0 (c) - fjc + A N )] 

> a,(c } p)[(D p f(c }Po )(p -p) + O((p -p) 2 ) 

-(K'D x f- 1 (f(c } p ) }Po )A 2 N + 0(A%))] 

> \D x r- 1 f(c }Po ) } p )\[(p -p)-K K'A 2 N + 0(A 3 N ) + 0((po-p) 2 )] (3.25) 

For each i > N 0} the left hand side of (3.23) tends to grow as (p —p) — KoK\A 2 N} at least 
for small A^f and p —p. Recall that D x f t ~ 1 (f(c } p ) } p ) tends to grow exponentially 
with i and Ajv tends to decay proportional to -^. Thus, given e > one might expect 
that there exists K > and C > such that either condition (1) or (2) are satisfied for 



some p > po — KA 2 N and i < —Clog A 2 



This, however, is a somewhat rough calculation, and in order to demonstrate that 
either conditions (1) or (2) are satisfied, we need to bound the higher order terms in 
(3.25). This involves getting a uniform estimate of the relationship between D p / 8 (c, p — 
6p) and D x f t ~ 1 (f(c + 6x } p ) } p ) for small values of Sp and 8x as i increases. This does 
not to be a trivial task and is something that should be looked into more carefully in 
the future. 
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3.6 Conclusions, remarks, and future work 

The primary goal of this chapter was to examine how shadowing works in one-dimensional 
maps in order to evaluate the feasibility of parameter estimation on simple chaotic sys- 
tems. We have been particularly interested in investigating how nonlinear folding affects 
parameter shadowing and how this might help explain numerical results which show 
asymmetrical behavior in the parameter space of one-dimensional maps. More specifi- 
cally, for a parameterized family of maps, f p} it is apparently the case that an orbit for a 
particular parameter value, p = p 0} is often shadowed much more readily by maps with 
slightly higher parameter values than by maps with slightly lower parameter values (or 
vice versa). This phenomenon has important effects on the possibilities for parameter 
estimation. For example, if we are given noisy observations of the orbit described above 
and asked what the parameter value was of the map that produced that data, then we 
would immediately be able to eliminate most values less than p as possible candidates 
for the actual parameter value. On the other hand, it may be much more difficult to 
distinguish p from parameter values slightly larger than p . 

For piecewise monotone maps with positive Lyapunov exponents, we demonstrated 
that the folding behavior around a turning point generally leads to asymmetrical behav- 
ior, unless the parameter dependence is degenerate in some way. In particular, images 
of neighborhoods of a turning point under f p tend to separate exponentially fast for per- 
turbations in p. This results in a sort of lead-lag phenomenon as the images for different 
parameter values separate, causing images for some parameter values to overlap each 
other more than others. Near the turning point, orbits for parameter values that lag 
behind cannot shadow orbits for the parameter values that lead unless another folding 
occurs because of a subsequent approach to a turning point. 

For the case of unimodal families of maps with negative Schwarzian derivative, the 
result is sharper. Apparently, if the parameter dependence is not degenerate, and if 
a map, f Po} has positive Lyapunov exponents for some parameter value, p 0} then for 
any e > sufficiently small, there exists C > so that for one direction in parameter 
space (either p > p or p < p ), all orbits of f Po can be e— shadowed by an orbit of 
f p if \p — po\ < Ce 3 . Meanwhile, in the other direction in parameter space, there exist 
constants 8 > and K > so that for any 7 > I there is a positive Lebesgue measure of 
parameter values such that if |p — p 1 < 8 } then almost no orbits of f Po can be e— shadowed 
by any orbit of f p if \p — p \ > (Ke) 1 . This clearly illustrates some sort of preference of 
direction in parameter space. 

One might also note that this result demonstrates that all orbits of certain chaotic 
(nonperiodic) systems can be shadowed by orbits of systems dominated by hyperbolic 
periodic attractors (consider, for example, the quadratic map, f p (x) = pxil — x)). Shad- 
owing results have sometimes been cited to justify the use of computers in analyzing 
dynamical systems, since if one numerically iterates an orbit and finds that it is chaotic, 
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then similar real orbits must exist in that system (or nearby systems). This is true, but 
one should also be careful, because the real orbits that shadow a numerically generated 
trajectory are often purely pathological (ie, such orbits are often not qualitatively similar 
to typical orbits of the system). 

In any case, many questions related to this material still remain unanswered. It 
seems to be quite difficult to come up with crisp general results when it comes to a 
general topic like parameter dependence in families of maps. For instance, I do not 
know of a simple way of characterizing exactly when parameter shadowing favors one 
direction over the other in parameter space for piecewise monotone maps. For unimodal 
maps, it appears that perhaps a useful connection to topological entropy may be made. 
If topological entropy is monotonic, and if there is a change in the topological entropy 
of map f p with respect to p at p = p then certain asymmetrical shadowing results seem 
likely for orbits of f po . However, topological entropy does not appear to be an ideal 
indicator for asymmetrical shadowing, since it is global in nature. On the other hand, 
if a piecewise monotone map has multiple turning points, it is possible for some turning 
points to favor higher parameters while other turning points favor lower parameters. 
Such examples are interesting, from a parameter estimation point of view, because that 
means that one may be able to effectively squeeze parameter estimates within a narrow 
band of uncertainty as the orbit being sampled passes close to turning points which favor 
different directions in parameter space. 
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Chapter 4 

General nonuniformly hyperbolic 
systems 



In this chapter we examine shadowing behavior for general one-parameter families of 
C 2 diffeomorphisms, f p : M — >■ M for p G IR where M is a smooth compact manifold. 
We want to consider why orbits shadow each other (or fail to shadow each other) in 
maps that are nonuniformly hyperbolic. This is important to investigate so that we 
can properly evaluate the feasibility of parameter estimation in a wide class of chaotic 
systems. 

The exposition in this chapter will not be rigorous. Most of the arguments will 
be qualitative in nature. Our goal here is to motivate some possible mechanisms that 
might help explain results from numerical experiments. In particular we will attempt 
to draw analogies to our work in chapter 3 to help explain what may be happening in 
multi-dimensional systems. 



4.1 Preliminaries 

Let us hrst outline some basic concepts. 

We start by introducing the notion of Lyapunov exponents. Let / : M — > M be a 
C 2 diffeomorphism. Suppose that M is a compact q— dimensional manifold and that for 
some x G M there exist subspaces, R q = E x Z) E 2 Z) ■ ■ ■ in the tangent space of / at x 
such that: 

A^ = lim -log\Df n (x)u\ if u G E l x \ E l ~ x . 

for some numbers X], > \ 2 x > .... Then the A^,'s are the Lyapunov exponents of the 
orbit, {/ 8 (x)}. Oseledec's Multiplicative Ergodic Theorem ([48]) demonstrates that for 
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any /-invariant probability measure, //, these Lyapunov exponents exist for //-almost all 

x e M. 

If there are no Aj,'s equal to zero, then there exist local stable manifolds at x tangent 
to the linear subspace, E x if X x < 0. There also exists an analogous unstable manifold. 
In other words, for almost any x £ M there exists an e > such that: 

W:(xJ) = {y<EM:d(r(x),r(y))<tfoTaRn>0} 
W:(xJ) = {yeM:d(f- n (x)J- n (y))<efor<Aln>0} 

These manifolds are locally as differentiable as /. This result is based on Pesin [52] and 
Ruelle [54]. The difference between these manifolds and manifolds for the uniformly 
hyperbolic case is that these manifolds do not have to exist everywhere, the angles 
between the manifolds can approach zero, and the neighborhoods, e, can be arbitrarily 
small for different x £ M. 

We can also define global stable and unstable manifolds as follows: 

W s (xJ) = {y<EM:d(r(x),r(y))^0asn^oo} 
W u (xJ) = {yeM:d(f- n (x)J- n (y))^0^n^^}. 

Note that these manifolds are invariant in the sense that f(W s (x, /)) = W s (f(x) } f). 
Although locally differentiable, the manifolds can have extremely complicated structure 
in general. 



4.2 Discussion 

We now return to the investigation of shadowing orbits. 

There have been some attempts to examine the linear theory regarding nonuniformly 
hyperbolic maps in order to make statements about shadowing behavior (see for exam- 
ple [24]). However, since the nonexistence of shadowing orbits fundamentally results 
from degeneracy in the linear theory, it is also be useful to consider what happens in 
terms of the structure of nearby manifolds. 

For almost every x, f looks locally hyperbolic. However, in nonhyperbolic systems 
if we iterate the orbit {/ 8 (x)}, we will eventually approach some sort of degeneracy. 

For example, one possible scenario is that for some point a £ {/ 8 (x)}, W s (a } f) 
and W u (a,f) are nearly tangent and intersect each other at some nearby point, y. As 
illustrated in figure 4.1, this structure implies a certain scenario for the evolution of 
the manifolds as we map forward with / or backward with / _1 . We will argue that this 
situation is in some sense a multidimensional analog for the folding behavior we observed 
in one dimension. 
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Figure 4.1: Possible situation near a homoclinic tangency. Note how a fold in the unstable 
manifold is created as we map ahead by f n , and a fold in the stable manifold is created as we 
map back by f~ n . 
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Figure 4.2: An illustrative example of how homoclinic tangencies can cause problems for 
shadowing. 

For one thing, the homoclinic intersection of manifolds can prevent or at least hamper 
shadowing. We illustrate this in figure 4.2. Consider for example two nearby points a 
and b such that d(a,b,) < 6 and let {c n } be a 6— pseudo-orbit of / with the following 
form: 

f f n (a) if n < 
Cn ~ \ f n (b)ifn>0 

In a uniformly hyperbolic scenario as shown in figure 4.2(a), we can easily pick a suitable 
orbit to shadow {c n } } namely {f l (z)} where z = W"(a,/) D W/(&, /). However if a 
homoclinic intersection is nearby as in figure 4.2(b), we see that there is no obvious way to 
pick a shadowing orbit, since there may be no point z satisfying z = W^(a } /)fl W/(&, /). 
Note that the difficulty in finding a shadowing orbit seems to depends on how close a is 
to the homoclinic tangency, and the geometry of the manifolds nearby. 

Homoclinic tangencies could also cause asymmetrical shadowing in parameter space. 
Numerical experiments with maps that favor higher parameters seem to show the follow- 
ing scenario: As we map a state space region near a homoclinic tangency ahead by f Po 
repeatedly, a tongue, or fold of the unstable manifold develops as the manifold expands. 
If we examine the corresponding situation in a map with a slightly higher parameter 
value, we find that the corresponding fold in the unstable manifold for the higher pa- 
rameter system overlaps the fold in the unstable manifold of the original system. In this 
case we expect that the original system would have difficulty shadowing a trajectory 
close to the apex of the fold in the higher parameter system. This situation is depicted 
in figure 4.3. A similar argument works for / _1 . Numerical results seem to indicate that 
for many families of systems at least, there is an ordering in parameter space such that 
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Figure 4.4: Refolding after a subsequent encounter with a homoclinic tangency. 

Also recall that with maps of the interval, a folded region can get refolded upon a 
subsequent encounter with a turning point. A similar thing can also happen in higher 
dimensions. Consider figure 4.4 for example. Here we see that the folded tongue of the 
unstable manifold gets refolded back on itself, possibly allowing lagging orbits to catch 
up so that shadowing is possible. This suggests that there may be interesting shadowing 
results of the sort described in chapter 3 for one dimension. The situation here, however, 
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is more complicated since in one dimension there were only a finite number of sources 
of folding, namely the turning point, while here there are likely to be an infinite number 
of sources for the folding. 



62 



Chapter 5 

Parameter estimation algorithms 



5.1 Introduction 

In this chapter we present new algorithms for estimating the parameters of chaotic 
systems. In particular we will be interested in investigating estimation algorithms for 
nonuniformly hyperbolic dynamical systems, because these systems include most of the 
chaotic systems likely to be encountered in physical applications. From our discussion 
in chapters 3 and 4, we know that there are three basic effects that are important to 
consider when designing a parameter estimation algorithm for nonuniformly hyperbolic 
dynamical systems: (I) most data points contribute very little to our knowledge of the 
parameters of the system, while a relatively few data points may be extremely sensitive to 
parameters, (2) the sensitive sections of orbits reflect nearby folding behavior which must 
be accurately modeled in order to extract information about the parameters, and (3) 
the folding behavior often results in asymmetrical shadowing behavior in the parameter 
space of the system, so we can generally eliminate only parameters slightly less than 
or slightly greater than the actual parameter value. The goal is to develop an efficient 
algorithm that takes all three of these effects into account. 

Our basic strategy will be to take advantage of property (I) above by using a linear 
filtering technique to scan through most of the data and attempt to locate parts of the 
trajectory where folding occurs. In sections of the trajectory where folding does occur, 
we will examine the data closely using a type of Monte-Carlo analysis which we have 
designed to circumvent the numerical pitfalls that accompany work with chaotic systems. 

We begin this chapter by surveying some traditional filtering techniques and exam- 
ining some basic approaches for parameter estimation problems (section 5.3). Those 
readers who are familiar with traditional estimation theory may wish to skim these 
sections. We go on in section 5.4 to examine how and why traditional algorithms fail 
in high-precision estimation of chaotic systems. We then propose a new algorithm for 
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estimating the parameters of a chaotic system in one dimension (section 5.5). This 
algorithm is generalized in section 5.6 to deal with systems in higher dimensions. 

Numerical results of these algorithms describing the performance of these techniques 
are presented in chapter 6. 



5.2 The estimation problem 

Let us begin by restating the problem. 1 Let: 

x n+l = Jp( x n) (5-1) 

and y n = x n + v n (5.2) 

where x n is the state of the system, y n are observations, v n represents noise, / evolves 
the state, p £ I p C IR is the scalar parameter we are trying to estimate, and I p is a closed 
interval of the real line. 

It will also be useful to write the system in (5.1) and (5.2) in terms of u n = (x n} p) } 
a combined vector of state and parameters: 

u n+1 = g(u n ) (5.3) 

y n = H n u n + v n (5.4) 

where the map, g, satisfies g(x } p) = (f p (x),p), and: 



H„ 



I q 



1 
where I q is a q X q identity matrix if the state, x, has dimension q. 



(5.5) 



We now make a few remarks about notation. In general, throughout this chapter, 
the letters x, p } u will correspond to state, parameter, and state-parameter vectors. Set 
x n = (x , xi, . . . , x n ), y n = (j/o, yi,...,y n ), and u n = (u , ui,...,u n ). 

The symbol " A " above a vector will be used to denote an estimate. For example, 
the estimate of the parameter p based on the observations in y n will be denoted p n . 
We will also use the notation, u n \k, to denote an estimate of u n based on observations, 
y k . Similarly, the symbol "~" will be used to denote an error quantity. For example we 
might write that u n = u n — u n \ n . 



-'^Note that the setup in (5.1) and (5.2) is somewhat less general than standard formulations of 
filtering problems. For example one could add an extra term, w n , to represent the system noise so that 
x n+ \ = fp(x n ) + w n , or one could add an extra function, h n (x), so that y n = h n (x n ) + v n , to reflect 
the fact that the observations might represent a more general function of the state. However, we have 
elected to keep problem as simple as possible in order to concentrate on how chaos affects estimation, 
and to be consistent with the presentation in chapters 2-4. 
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5.3 Traditional approaches 

We now examine some basic methods for approaching parameter estimation. In sec- 
tions 5.3.1 and 5.3.2 we mainly concentrate on providing the motivation behind linear 
techniques like the Kalman filter. This treatment is extended in section 5.3.3, where 
nonlinear techniques are discussed in more detail. The material in this section is well- 
known in the engineering community, but we explain it here because it provides the 
basis for new algorithms we develop later to deal with chaotic systems. 

There are a variety of ways to approach parameter estimation problems. Engineers 
have developed a whole host of ad hoc tricks that may be applied in different situations. 
The basic idea, however, is relatively simple. Given observations, {yk}^ =0} and a model 
for f p} we would like to pick our parameter estimate, p = p n} so that there exists an 
orbit, {xk(p)}n=07 of fp that makes the residuals, 

£k(p) = Vk ~ Xk{p) 

as small as possible for k £ {0, f , . . . , n}. In order to choose the best possible estimate, 
p n , we need some criteria for evaluating how small these residuals are. 

From here, there are a number of different ways to approach the problem of how 
to choose the optimizing criteria to make use of all the known information. In fact, 
the recursive Kalman filter itself has many different possible interpretations. Many of 
the different approaches to parameter estimation provide interesting insight into the 
estimation problem itself. Our objective here will be to motivate some of the different 
ideas on how to look at parameter estimation, without getting immersed in specific 
derivations. The reader may consult [3], [29], or [23] for more detailed and/or formal 
treatments of this subject. 



5.3.1 Nonrecursive estimation 

Least squares estimation 

One of the simplest ideas about how to estimate parameters is to choose the estimate 
p n so that p = p n minimizes the quantity: 

n 

S 'n(p) = s „ , inf (52(Vi ~ x t \n(p)) T (R'ir 1 (y t ~ Xi\ n (p))} (5.6) 

where Z(p) is the set of all orbits of f p and (i?') _1 are symmetric positive-definite 
matrices that weight the relative importance of various measurements. This sort of 
idea, known as least squares estimation, dates back to Gauss [22]. 

The formulation in (5.6) is not really useful for estimating parameters in practice, 
since there is no direct way of choosing p n to minimize (5.6). Things become more 
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concrete, however, if we assume the function g in (5.4) is linear in both state and 
parameters. 2 In this case we can write: 

y n = Q nUQ + v n (5>?) 

where G n is a constant matrix that effectively represents the dynamics of the system. 
Our goal is to get a good estimate for u = (x 0} p) based on the observations in y n . In 
this case, least squares estimation amounts to minimizing 

S n (u ) = (y n - G n {u )fR-\y n - G n (u n )) (5.8) 

with respect to u where R~ x are positive-definite weighting matrices. Our estimate for 
u based on y n } u \ n = (x \ n} p n ) } is the value of u that minimizes S n (u ). We can find 
the appropriate minimum of S n (u ) by taking the derivative of S n with respect to u . If 
we do this we find that thus value of u that minimizes S n (u ) is: 

u oln = (G T n R- 1 G n )- 1 G T n R- 1 y n (5.9) 

where G^ denotes the transpose of G n . 

Stochastic framework 

Another way to approach the problem is to think of u n , y n} and v n as random 
variables. We shall assume that the v n } s are independent random variables with zero 
mean. The idea is to choose a parameter estimate, p n , based on y n so that the residuals, 
€ i(p) = IJi ~ x i(p)} are as close to zero as possible in some statistical sense for i £ 
{0,1,. ..,n}. 

We can write the probability density function 3 for u n given y k according to Bayes 
rule: 

P(u n \y k ) = {y l p ^ k) { n) (5.10) 

These density functions describe everything we might know about the states and param- 
eters of the system. Later we will examine more closely how tracking such probability 
densities in full can provide information about how to choose parameter estimates, es- 
pecially in cases involving nonlinear or chaotic systems. To start with, however, we 
concentrate on examining conventional filters which look only at hrst and second order 
moments of these densities. 



2 Note that this assumption is extremely restrictive in practice, since even if the system is linear 
with respect to state, it is generally nonlinear with respect to combined states and parameters. The 
purpose of this example, however, is to simply motivate linear ideas. We address nonlinearity in the 
next section. 

3 Contrary to common convention, our choice of the letter p for the parameter necessitates using a 
capital P to denote probability density functions. Thus P(u n \y k ) represents the density for for u n given 
the value of y k . 
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Minimum variance 

Given the density function, P(u \y n ) } one approach is to pick the estimate, iio\ n} to 
minimize the variance, 

E[(u - u \n) T {uo ~ u \ n )] (5-11) 

where E[x] = J xP(x)dx denotes the expected value of x. This criterion is called the 
minimum variance condition. It turns out that this estimator has particularly nice 
properties. For instance, it is not hard to show (e.g., [57]) that the u \ n that minimizes 
(5.11) also satisfies: 

u \ n = E[u \y n ]. 

for any density function, P(u \y n ). 

Now suppose that g is linear in state and parameters so that (5.7) is satisfied. Let 
us attempt to find the so called optimal linear estimator. 

u \ n = A n y n + b n 

where the constant matrix, A n , and constant vector, b n , are chosen to minimize the 
variance condition in (5.11). Assuming that the estimator is unbiased (i.e., E(u — 
u(n\0)) = 0) then: 

b n = E(u ) - A n E(y n ). 

Minimizing E[(u — u \ n ) T (u n — Uo\ n )] we find ([57]) that 

A n = [Q- 1 + Gf r n R^G)- 1 Gf r Br 1 (5.12) 

where Q = E[u Uq] is the covariance matrix of u and R n = E[v n (v n ) T ] is the covariance 
matrix of v n . Thus we have: 

u oln = E(u ) + A n (y n -E[y n ]) (5.13) 

where A n is as given in (5.12). Comparing this result with (5.9) we see that the u \ n 
above, which we derived as the linear estimator with minimum variance, actually looks 
a lot like the estimator from the deterministic least squares approach except for the 
addition of a priori information about u (in the form of E(u ) and the covariance 
Q). With the minimum variance approach, the weighting factor R n also has a definite 
interpretation as the covariance of the measurement noise. 

Furthermore, if we assume that u n and v n are Gaussian random variables, 4 and 
attempt to optimize the estimator u \ n for minimum variance, we again find (see [30]) 
that u \ n has the form given in (5.12) and (5.13). 



4 A random variable v £ R q has Gaussian distribution if 

P( v ) = _J_e-T(«-B(«)) T ^ 1 («-B(«)) 
1 ' (2^ 

where E[v] is the expected value of v and £„ = E[vv T ] is the covariance matrix of v. 
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Thus, in summary, we see that the optimal estimator, u \ n as given in (5.12) and 
(5.13) has a number of different interpretations. If the system, g, is linear then the 
estimator can be thought of as resulting from a deterministic least squares approach. If 
u n and v n are thought of as random variables, then u \ n = E[u \y n ], and if we assume 
that u n and v n are Gaussian then the u \ n given in (5.13) satisfies the minimum variance 
condition. Alternatively, if we drop the Gaussian assumption and search of the best 
linear estimator that minimizes the variance condition, we find that u \ n as given in 
(5.12) and (5.13) is the optimal linear estimator. All these interpretations motivate us 
to use the estimator given in (5.12) and (5.13). 



5.3.2 The Kalman filter 

We now have the form of an optimal filter for linear systems. However, the filter has 
problems computationally. It would be nice if there were a way so that new data could be 
taken into account easily without having to recompute everything. This is accomplished 
with the recursive Kalman filter. 

The Kalman filter is mathematically equivalent to the linear estimator described in 
(5.12) and (5.13), except that it has some important computational advantages. The 
basic premise of the Kalman filter is that the state of the filter can be kept with two 
statistics, u n \ n and S n | n , where E n | n is the covariance matrix, E\(u n — u n \ n ){u n — u n \ n ) T \. 
Once we have these two particular statistics, it will be possible, for example, to determine 
the next state of the filter, u n+ \\ n+ \ and Y, n+ \\ n+ \, directly given a new piece of data, 
j/n+ij the filter's present state, u n \ n , S n | n , and knowledge of the map g. 

Specifically, suppose we are given the linear system: 

u n+1 = $ n u n 
y n = H n u n + v n . 

where v n are independent random variables with zero mean and covariance R n . The 
recursive Kalman filter can be written in two parts: 



Prediction: 



Combination: 



il n+1 \ n = § n Un\n (5.14) 

^n+l\n = $ n Z nln $l + R n+1 (5.15) 



u n+l\n+l — u n+l\n + k-n+iyVn+l — H n+ iU n+ i\ n ) (5.16) 

S n +l|ra+l = \I — A n _|_i_H n _|_i)E n + 1 | n (5-17) 
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where the Kalman gain, K n +\ } is given by: 

l \ n H n+1 [H n+1 ij n+1 \ n n n+1 

Motivation and derivation 



Kn+l — ^n + l\nH n+1 [H n+ iYj n+1 \ n H n+1 + R n+ i] . (5.18) 



The Kalman filter can be motivated in the following way. 5 Consider the metric space, 
X, of random variables where inner products and norms are defined by: 

(x,y) = E[xy T ] 
and ||x|| = (x, x) 

if x,y G X. Let Y n = span{y 0} yi } . . . } y n } be the space of a all linear combinations 
of {j/o, J/i, • • • , ]Jn}- To satisfy the minimum variance condition, we would like to pick 
u n \n £ K to minimize: 

E[u^u n ] = \\u n \\. 

where u n = u n — u n \ n . This formulation gives a definite geometric interpretation for the 
minimization problem and helps to show intuitively what the appropriate u n \ n is. In 
order to minimize the distance between u n and u n \ n G Y n , it makes sense to pick u n \ n so 
that u n is orthogonal to Y n . That is, we require: 

(u n ,y) = (5.19) 

for any y G Y n . It is not hard to show that this condition is in fact sufficient to minimize 
E[u^u n ] (see e.g., [3]). From a statistical standpoint, this result also makes sense since it 
says that the error of the estimate, u n} should be uncorrelated with the measurements. 
In some sense, the estimate uses all the information contained in the measurements. 

We can now derive the equations of Kalman filter. The prediction equations are 
relatively straightforward: 

u n+1 \ n = E[u n+1 \y n ] = $ n u n \ n 

^n+l\n = E[(u n+1 \ n — U n +i| n )(u n+ i| n — Un+^n) ] = $ n E n | n $ n + R n +1- 

For the estimator u n+1 | n+1 to be unbiased, u n+1 | n+1 must have the form given in 
(5.16). Now let us now verify that the formula for K n+ \ in (5.18) makes the Kalman 
filter an optimal linear estimator. To do this, we must show that K n+ \ minimizes the 
variance, E[u^ +1 u n+1 ] } where u n+1 = u n+1 - ii B+ i| n+ i. Since u n+1 \ n+1 G Y n+1 we know 
from (5.19) that a sufficient condition for E[u^ +1 u n+ i], to be minimized is that: 

E[ul +1 u n+1 \ n+1 ] = TraceE[u n+1 ul +1{n+1 ] = 0. (5.20) 



3 Much of the explanation here follows the exposition in Siapas [56]. 
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Let us investigate the consequences of this condition. First we have: 

U n +1 = $nU n — [u n + l\n + K n+1 (Vn+1 ~ H n+1 U n+1 \ n )] 

= $ n u n - $ n u n \ n - K n+1 [H n+1 u n+1 + v n+1 ] + K n+1 H n+1 $ n u n \ n 
= (I - K n+1 H n+1 )$ n u n - K n+1 v n+1 

So, 

E[u n+ iul +1 \ n+1 ] = E[{(I - K n+1 H n+1 )$ n u n - K n+1 v n+1 } 

{ii n +i\n + K n+1 (y n+1 — H n+1 u n+1 \ n )} ] 
= E[{(I - K n+1 H n+1 )$ n u n - K k+1 v n+1 } 

{$ n u n \ n + K n+ i$ n u n + K n+1 v n+1 } T ] (5.21) 

Since we require that E[u^u n \ n ] = Trace{E[u n u^]} = 0, from (5.21) we get that: 

Trace{E[u n+1 u n+1 \ n+1 ] T } 
= Trace{ (I - K n+1 H n+1 ) $ n E [u n u T n ] <$> T n H T n+1 K T n+1 - K n+1 E [v n+1 v? +1 ] K T n+1 } 
= Trace{ $ n E n , n C#J+i K+i ~ K n+i H n+1 $ n E n , n C#J+i K+i ~ K n+i R n+ i K% +1 } 
= Trace{[E n+1 | n #J +1 - K n+1 (H n+1 T„ n+1 \ n H^ +1 + R n+1 )]K^ +1 }. 

Thus, choosing K n+1 = T, n+1 \ n H^ +1 [H n+ iT, n+1 \ n H^ +1 + i? n+ i] _1 as in (5.18) makes 
Trace{E[iin +1 u n +i\n+i]} = and therefore minimizes E[u^ +1 u n+ i]. 



The equation for E n+1 | n+1 in (5.17) can then be derived by simply evaluating E n+1 | n+1 

X 



E[ul +1 u n+1 



5.3.3 Nonlinear estimation 

Probability densities 

The filters we looked at in the previous section are optimal linear estimators in the 
sense that a minimum variance or least squares condition is satisfied. Estimators like 
the Kalman filter are only optimal, however, if the system is linear and the correspond- 
ing probability densities are Gaussian. Let us now, however, consider how one might 
approach estimation problems when these rather stringent condition are relaxed. 

Let us begin by recalling the density function in (5.10): 

P(u n \y k ) = KJ p { y k) K ] (5-22) 

where u n = (x n ,p) is the joint vector of state and parameters and y k = (j/ , Vi, ■ ■ ■ , Vk) 
represents a vector of observations. This density function represents everything we know 
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Figure 5.1: Mapping probability densities using g and combining them with new information. 
This is a probabilistic view of what a recursive estimator like the Kalman filter does. Note 
that Gaussian densities have equal probability density surfaces that form ellipsoids. In two 
dimensions we draw the densities as ellipses. 

mapping P(u n \y n ) using the system dynamics, g. More precisely we have that: 

p( Un+1 \y n )= J2 [PiAv^DgizT 1 ] (5.23) 

zeu(u„ +1 ) 

where U{u n+ \) = {z\z = g~ x (u n+ i)} and \Dg(z)\ is the determinant of the Jacobian of 
g evaluated at z. It is not hard to show that if g is linear and P(u n \y n ) is Gaussian 
then P(u n+1 \y n ) is also Gaussian. Also by Bayes rule, (P(A,B) = P(A\B)P(B) = 
P(B\A)P(A)) we have that: 

P(u n+1 ,y n+1 \y n ) = P(u n+1 \y n+1 )P(y n+1 \y n ) = P(y n+1 \u n+1 ,y n )P(u n+1 \y n ) 

where P(y n+ i\y n ) = J P(y n+ i \u n+ i)P(u n+ i\z n )du n+ i. Thus we find that combining in- 
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formation from a new measurement, y n +i, results in the density: 

P^ l \v M )= P ^ l ^ l)P .^' r) . (5-24) 

P{yn + i\y n ) 

Since the denominator is independent of u„+i, it is simply a normalizing factor and is 
therefore not important for our considerations. Also note that since P(y n+ i\u n+ i) and 
P(u n+ i\y n ) are Gaussian, P(u n+ i \y n+1 ) must also be Gaussian. Thus, by induction if 
all the data is Gaussian distributed, then P(uk\y k ) must be Gaussian for any k. Also, 
the MAP estimate and minimum variance estimate for u n+ i are both the same, namely 
u n+1 \ n+1 = E[u n+1 \y n+1 ]. 

Now consider what happens if the system is nonlinear. The appropriate densities still 
describe all we know about the states and parameters. In particular, the equations in 
(5.23) and (5.24) are still valid descriptions of how to map ahead and combine densities. 
However, in general there are no constraints on the form of these densities. As a practical 
matter, the problem becomes how can we deal with these arbitrary probability densities? 
How can one represent approximations of the densities in a computationally tractable 
form while still retaining enough information to generate useful estimates? There have 
been a number of efforts in this area: 

Extended Kalman filter 

The most basic and widely used trick is to simply linearize the system around the 
best estimate of the trajectory and then use the Kalman filter. The idea is that if the 
covariances of the relevant probability densities are small enough, then the system acts 
approximately linearly on the densities, so linear filtering may adequately describe the 
situation. For the system, 

u n+1 = g(u n ) (5.25) 

y n+1 = H n u n + v n , (5.26) 

as in (5.3), (5.4), and (5.5), the extended Kalman filter is given by the following equa- 
tions, mirroring the Kalman filter in (5.14)-(5.18): 

Prediction: 

U n +l\n = g(u n \n) (5.27) 

E n+1 | n = Dg(u n \ n )Y> n \ n Dg(u n \ n ) T (5.28) 



Combination: 



u n+l\n+l — u n+l\n + kn+iyVn+l — H n+ iU n+ i\ n ) (5.29) 

S n +l|ra+l = [I — kn+lH n+ i)lj n+ i\ n (5.30) 
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where the Kalman gain, K n+ i, is given by: 

l \ n H n+1 [H n+1 ij n+1 \ n n n+1 

Other work in nonlinear estimation 



Kn+l — ^n + l\nH n+1 [H n+ iYj n+1 \ n H n+1 + R n+ i] . (5.31) 



A number of other efforts to do estimation on nonlinear systems have concentrated 
on developing a better description of the probability densities. For example, in [23] 
methods are presented that attempt to take into account second order behavior from 
the dynamics. However, the method still relies on a basically Gaussian assumption of the 
error distributions, since it computes and propagates only the mean and covariance ma- 
trices of densities, adjusting the computations to account for errors due to nonlinearity. 
Taking into account higher order effects in the densities is in fact a difficult proposi- 
tion because there is no obvious representation for these densities. Gaussian densities 
are invariant under linear transformations, and are especially easy to deal with when 
it comes to combining data from new measurements. However, similar higher order 
representations do not exist. 

Other methods do attempt to get a better representation of the error densities. For 
example in [2], a method is proposed whereby the densities are represented as a sum of 
Gaussians. For example, one might write: 

P( u ) = Yl a t N ( u 'i m t7 Si) 

i 

where the a 8 's represent scalar constants and N(u; mi, E 8 ) evaluates the Gaussian density 
function with mean m 8 - and covariance matrix E 8 - at u. 6 If each of the Gaussians in the 
sum are localized in state-parameter space (have small covariances) then we might be 
able to use linear filters to evolve and combine each density in the sum in order to 
generate a representation of the entire density. 



5.4 Applying traditional techniques to chaotic sys- 
tems 

In this section we examine why traditional techniques have a difficult time performing 
high accuracy parameter estimation on chaotic systems. This investigation will illumi- 
nate some of the general difficulties one encounters when dealing with chaotic systems, 
and will provide some useful ground rules for designing new parameter estimation algo- 
rithms. 

Let us attempt, for example, to naively apply an estimator like the extended Kalman 
filter in (5.27)-(5.31) to a chaotic system and see what problems emerge. 



3 In other words, N(u: ni{, E;) = — ^-g-e sC" m >) s » (" m >) if q is the dimension of u. 

(27T)2 
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The first problem one is likely to encounter is numerical in nature, and has a relatively 
well-known solution. It turns out that the formulation in (5.27)-(5.31) is not numerically 
sound. The problems are especially bad, however, in chaotic systems because covariance 
matrices become ill-conditioned quickly as densities are stretched exponentially along 
unstable manifolds and contracted exponentially along stable manifolds. Similar sorts 
of problems, albeit less severe, have been encountered and dealt with by conventional 
filtering theory. One solution is to represent the covariance matrix E n | n as the product 
of two matrices: 

and propagate the matrices S n \ n instead of Y, n \ n . These estimation techniques, known 
as square root algorithms, are mathematically the same as the Kalman filter, but have 
the advantage that they are less sensitive to ill-conditioned covariance matrices. Using 
square root algorithms, for instance, the resulting covariance matrices are assured to 
remain positive definite. Since the decomposition in (5.32) is not unique, there are 
a number of possible implementations for such algorithms. The reader is referred to 
Kaminski [31] and related papers for detailed implementation descriptions. 7 

Other problems result from the nonlinearity of the system. Some of these problems 
can be observed in general nonlinear systems, while others seem to be unique to chaotic 
systems. First of all, using a linearized parameter estimation technique on any nonlin- 
ear system can cause trouble, even if the system is not chaotic. Often errors due to 
nonlinearity cause the filter to become too confident in its estimates, which prevents the 
filter from updating its information correctly based on new data and eventually locks 
the filter into a parameter estimate with larger error than expected. This phenomenon is 
known as divergence. 8 It is not hard to see why divergence can become a problem with 
estimators like the Kalman filter. For example, in the linear Kalman filter, note that 
the the estimation error covariance matrix, S n | n , can actually be precomputed without 
knowledge of the data. In other words there is no feedback between the actual perfor- 
mance of the filter and the filter's estimate of its own accuracy. In the extended Kalman 
filter there is also virtually no feedback between the observed residuals, y n — H n ii n} and 
the computed covariance matrix, Y, n \ n . 

The divergence problem is considerably worse in nonuniformly hyperbolic systems 
than it is in other nonlinear applications. This is because folding, a highly nonlinear 
phenomenon, is crucial to parameter estimation. While linearized strategies may do rea- 
sonably well following most chaotic trajectories if the uncertainty variances are small, 
linearized techniques invariably have great trouble with the sections of trajectories that 
are most sensitive to parameter perturbations. Figure 5.2 gives a schematic of what 
happens when folding occurs. The linearized probability densities in that case become 



7 In this report, whenever we refer to numerical results using square root filtering techniques, the 
implementation we use is the one given in [31] labeled "Square Root Covariance II." 
8 See for example, Ljung [41] for discussion of some related work. 
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Figure 5.2: In this picture we show a typical example of what can happen to probability 
densities in chaotic systems. Because of the effects of local folding, linear filters like the 
Kalman filter sometimes have difficulty tracking nonuniformly hyperbolic dynamical systems. 

In chapter 6, we show some examples of the performance of the square root extended 
Kalman filter on various maps. The filter generally performs reasonably well at first 
but eventually diverges as the trajectory it is tracking passes close to a folding area. As 
we observed earlier, once the extended Kalman filter becomes too confident about its 
estimate, it generally cannot recover. While various ad hoc techniques can make small 
improvements to this problem, none of the standard techniques I encountered did an 
adequate job of handling the folding. For example, consider the case of the Gaussian 
sum filter, which is basically the only method that one might expect to have a chance at 
modeling the folding behavior. Note that the densities in the Gaussian sum have to be 
re-decomposed into constituent Gaussians every few iterations because of spreading, as 
expansion along unstable manifolds quickly pushes most of the constituent densities out 
into regions of near zero probability. In addition, the position of the apex of the fold, 
which is crucial to estimating the correct parameters, is quite difficult to get a handle 
on without including many terms in the representation of the density. 



5.5 An algorithm in one dimension 

In the previous section we saw that traditional techniques do not seem to do a reasonable 
job modeling the effects of folding on parameter estimation. Since there seems to be 
no simple way of adequately representing a probability density as it gets folded, we 
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resort to a Monte Carlo representation of densities near folded regions, meaning that 
the appropriate densities are sampled at many different points in state and parameter 
space and this data is used as a representation for the density itself. The eventual hope is 
that we will only have to examine a fraction of the data using computationally-intensive 
techniques like Monte Carlo, since we know that only a few sections of data are really 
sensitive to parameter values. 

Though the ideas are simple, the actual implementation of such parameter estimation 
techniques is not as easy one might think because of numerical problems associated with 
chaotic systems. In this section we examine the basics of how to apply Monte Carlo-type 
analysis to chaotic systems by looking at an algorithm for one-dimensional noninvertible 
systems. An algorithm for higher dimensional invertible systems will be considered in 
section 5.6. 



5.5.1 Motivation 

Let us consider the following question. Suppose we are given a family of maps of the 
interval, f p : I x — >■ I X} for p £ I p and noisy measurement data, {?/„}, such that: 

and y n = x n + v n , 

where x n £ I x for all n, I x C IR, and p £ I p C IR such that f Po is chaotic. Suppose also 
that the v n } s are zero mean Gaussian independent variables with covariance matrix, R n , 
and that we have some a p rio ri knowledge about the value of p . Given this information, 
we would like to use the state samples, {?/„}, to get a better estimate of p . Let us 
assume for the moment that we have plenty of computing power and time. What sort 
of method is likely to extract the most possible information about the parameters of the 
system given the state data? 

The hrst thing one might try is to simply start picking parameter values, p } near p 
and initial conditions, x, near y 0} and attempt to iterate orbits of the form {fp(x)}^ =0 
to see if they come close to {j/i}" =0 . If no orbit of f p follows {j/i}" =0 then we know that 
Po ^ p. As we increase n, many orbits of the form {f p (x)}'1 =0 diverge from {j/ 8 }™ =0 , 
and we can gradually discard more and more values of p as candidates for the actual 
parameter value, p . 



5.5.2 Overview 

In order to implement this idea, we hrst need some criteria for measuring how close orbits 
of f p follow {yi} and some rules for how to use this information to decide whether the 
parameter value, p } should remain a candidate for our estimate of p . Basically, we want 
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p to be eliminated if the best shadowing orbit, {ft(x)}, of f p is far enough away from 
{yi} that it is highly unlikely that sampling {fl(x)} could have resulted in {j/;}, given 
the expected measurement noise. As discussed earlier, one way to do this is to think 
of x n , y n} and p as random variables and to consider a probability density function of 
the form, P(x 0} p \y n ). Our goal will be to numerically sample such probability densities 
and use the results to extract information about the parameters. This is accomplished 
in stages, since we can only reliably compute orbits for a limited number of iterates at 
once. Information from various stages can then be combined to construct the composite 
density, P(x 0} p \y n ) } for increasing values of n. 



So, for example, let us examine how to analyze the kth stage of observations, con- 



sisting of the data, {yi}^ k+1 , where Nk+i is chosen to be as far away from Nk as possible 



without greatly affecting the numerical computation of orbits shadowing {yi}jq +1 ■ Let 
y [a, b] = (j/ a , j/ a +i, . . . , j/6 ), be a vector of state data. We begin by picking values of p near 
p . For each of these parameter samples, p } we pick a number of initial conditions, x, 
and iterate out orbits of the form {fl(x)}^ =N for n > Nk to evaluate P{xj^ k \p 0} y[Nk } n\) 
for increasing values of n. 9 

For each n > Nk we want to keep track of the set of initial conditions x £ I p such 
that P(xN k \p 0} y[Nk } n\) is above a threshold value. If P{xj^ k \p 0} y[Nk } n\) is below the 
threshold for some value of xw k , we discard the orbit {/p(^Af A .)}™ =0 because it is too 
far from {yi}% and attempt to repopulate a region, Uk(p } n) C I x , in state space with 
more initial conditions, where Uk(p } n) is constrained so that x £ Uk(p } n) implies that 
P( x N k |po, y\.Nk, n\) is above the threshold. Some care must be taken in figuring out how 
to choose Uk(p } n) so that new initial conditions can be generated effectively. Without 
care, these regions develop Cantor-set-like structure that is difficult to deal with. 

After collecting information from various stages, we then recursively combine the 
information from consecutive stages (similar to probabilistically combining densities in 
the Kalman filter) in order to determine the appropriate overall statistics for concate- 
nated orbits over multiple stages. After combining information, at the end of each stage 
we also take a look at the composite densities for the various parameter samples, p. 
Values of p whose densities are too low are thrown out, since this means that f p has 
no orbits which closely shadow {yi}^ 1 . The surviving parameter set, i.e., the set in 
parameter space still being considered for the parameter estimate, must then be repopu- 
lated with new parameter samples. The statistics of the new parameter samples may be 
determined through a combination of interpolation with nearby parameter samples and 
recomputation of the statistics of nearby stages. Because of the asymmetrical behavior in 



9 Note that P(xN k \po, y[Nk, n]) is sufficient to determine P(xN k ,po\y n ) for any particular value of p, 
since 

P(x Nk , Po \y n ) = P(x Nk \p ,y n )P(p ) 

where P(po) is a normalizing factor quantifying a pnon information about the parameters. 
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Figure 5.3: This block diagram illustrates the main steps in the proposed estimation algorithm 
for one-dimensional systems. The algorithm breaks up the data in sections called "stages." 
The diagram above shows the basic steps the algorithm takes in analyzing each stage of data. 
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5.5.3 Implementation 

Below we explain various aspects of the algorithm in more depth. Note that unless 
otherwise indicated, x n , y n} and p refer to random variables in the discussion below. 

Evaluating probability densities 

The hrst thing we must address is how to compute the values of relevant densities. 
From (5.24) we have that: 

n^P0\y ) - p(y n | y n-l) • ^ SS ) 

Expanding the right hand side of this equation recursively we have: 

n 

P{x ,p \y n ) = K 1 P(x , Po )l[N(y i ; f l po (x ), R t ) (5.34) 

8=0 

where K\ is some constant and P(x 0} p ) is the probability density representing a pri- 
ori knowledge about the values of x and p 0} while N(P (x ); j/i, i? 8 ) is the value of a 
Gaussian density with mean /' (x ) and covariance matrix i? 8 - evaluated at j/ 8 -. In the 
limit where no a priori knowledge about x is available, the weighting factor, P(x 0} p ) } 
reduces to P(p ) } reflecting a priori information about the parameters. Then, taking 
the natural log of (5.34) we get that: 

1 " 
log[P(x , Po \y n )] = K 2 + log[P( Po )] - ^ Y.(fp( x °) ~ y t ) T R7 1 (f; o ( x o) ~ Vi)- (5-35) 

1 i=0 

where K 2 is a constant. Note that except for the extra term corresponding to the a 
priori distribution for p 0} maximizing (5.35) is essentially the same as minimizing a least 
squares criterion. Also note that for any particular value of p we have from (5.35) that: 

log[P(x \p ,y n )] = log[P(x ,p \y n )]-log[P(p )] 

1 n 

= ^--Y.iM^-yifR-'if^-yi). (5.36) 

8 = 

Representing and dividing state regions 

Given a parameter sample, p 0} and stage, k, we need to specify how to choose sam- 
ple trajectories, {f l po {x Nk )}7=o k , to shadow {yi}? =Nk for n G {N k , N k + l,...,N k+1 }. 
For each n G {N k ,N k + 1, . . . , N k +i} we want to keep track of the set of interesting 
initial conditions, Uk(po } n) C I x , from which to choose states, xjvj., to evaluate the den- 
sity, P(xN k \p 0} y [Nk } n\). We require that if x^ k G Uk(po 7 n) } then x^ k must satisfy the 
following thresholding condition: 

log[P(x Nk \p ,y[N k ,n])]> sup {log[P(x Nk \p , y[N k , n])]} - a 2 (5.37) 



XN k £lx 
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for some constant, a > so that the orbit, {/' {x^ k )^I Q *, follows sufficiently close to 
{Vi}?=N • G can be interpreted to be a measure of the maximum number of standard 
deviations x^j k is allowed to be from the best shadowing orbit of the map, f Po . This 
interpretation arises since if P{xj^ k \p 0} y n ) were Gaussian, condition (5.37) would be 
satisfied by all states, xw k , within a standard deviations of the mean, XN k (p 0} n) = 
J x e i x N k P(%N k |po, y\.Nk, n])dx. 10 To be reasonably sure we don't accidentally eliminate 
important shadowing orbits of f Po close to { £/«•}, we might choose, for example, for a to 
be between 8 and 12. 

Given a parameter sample, p 0} let Vk(po 7 n) C I x represent the set of all x^ k £ 
I x satisfying (5.37). Recall that Uk(po 7 n) represents the set of points from which we 
will choose new sample initial conditions, x^ k . We know that we want Uk(po 7 n) C 
Vk(po,n), but problems arise if we always attempt to saturate the set Vk(po 7 n) with 
sample trajectories. For low values of n, Vk(po 7 n) is an interval. In this case, let 
Uk(po } n) = Vk(po } n) and we can simply choose initial conditions, xw k , at random inside 
Vk(po 7 n) to generate samples of P{xj^ k \p 0} y[Nk } n\). As n gets larger, Vk(po 7 n) tends 
to shrink as f p ~ Nk expands regions in state space and more trajectory samples get 
discarded from consideration for failing to satisfy (5.37). However, as long as Vk(po 7 n) 
is an interval, continue to set Uk(po 7 n) = Vk(po 7 n) } since it is not hard to keep track of 
Vk(po 7 n) to repopulate the region with new trajectory samples. 

A problem occurs, however, because of the folding around turning points. If the 
region, f!^(Vk(po } rn)) } contains a turning point for some integer m > 0, then as n grows 
larger than ra, Vk(po,n) may split into two distinct intervals, V^(po,n) and V^(po,n). 
Folding causes the two separate regions to get mapped into each other by /™ +1 (i.e., 
f^ +1 {V}?~ (po , n)) = /™ +1 (V fc ~(po, ^)))- In addition, the new intervals, V^~(po,n) and 
V^~(po,n), can also be split apart into other separate intervals by similar means as n 
increases. In principle, this sort of phenomenon can happen arbitrarily many times, 
turning Vk(po 7 n) into a collection of thin, disjoint intervals. This makes it difficult 
to keep up with a characterization of Vk(po 7 n) } and makes it difficult to know how to 
choose new initial conditions, x^ k £ Vk(n,p), to replace trajectory samples that have 
been eliminated. 

Instead of attempting to keep up with all the separate areas of Vk(po } n), and trying 
to repopulate all these areas with new state samples, we let Uk(po 7 n) C Vk(po 7 n) be 
the single connected interval of Vk(po } n) where P{xj^ k \p 0} y[Nk } n\) is a maximum. 11 We 



10 One might think that this Gaussian assumption may be a bad one and that in general we might, for 
instance, want to make sure that we kept a set, Q, of initial states such that Pr(x^ k £ Q\po) > I — a 
for a > small, where Pr(X) is the probability of event X. However, in practice, the condition (5.37) 
is simpler to evaluate and works well for all the problems encountered. The choice of thresholding value 
is not critically important as long as it is not so high that close shadowing orbits are thrown away from 
consideration. 

n Strictly speaking we actually want to maximize P{xjq h _ 1 |po, y[Nk-i, ^k\)P{xN h |po, y[Nk, n]), (see 
the section on how to combine data). In practice this almost always amounts to maximizing 
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know that the separate areas of Vk(po 7 n) eventually get mapped into each other, so 
there is no way that one of the separate areas of Vk(po 7 n) can end up shadowing {j/ 8 } 
if no states in Uk(po 7 n) can shadow {j/;}. Since we are primarily interested in the best 
shadowing orbit of f Po} keeping up with orbits with initial conditions in Uk(po 7 n) is 
adequate. 

Finally, note also that it is sometimes obvious that the parameter sample, p 0} cannot 
possibly be the correct parameter value. This happens if no orbit of f Po comes anywhere 
close to shadowing {j/;}. In this case we can immediately discard parameter sample, p 0} 
from consideration. 

Deciding what parameters to keep 

We need to evaluate how good a parameter sample is, so we know which parameter 
samples to keep and which parameters to eliminate as a possible choice for the parameter 
estimate. After the completion of stage k } we evaluate a parameter sample, p 0} according 
to the following criterion: 

L k+1 (p ) = sup {logtP^pol^ 1 )]} (5.38) 

which is what one would expect if we were interested in obtaining a MAP estimate. Let 
Vk be the set of parameter samples valid at the start of the kth stage. We will eliminate 
a parameter sample, p 0} after the kth stage if it satisfies the following formula: 

L k+ i{po) < sup {L k+ i(p')} - a 2 . 

p'ev k 

where a > is some measure of the number of standard deviations p is allowed to be 
from the most likely parameter value. 

Choosing the number of iterates per stage 

The necessity of breaking up orbits into stages is apparent, since orbits can be reliably 
computed only for a limited number of iterates. We now explain how to determine the 
number of iterates in each stage. Let PMAp(k) } be the MAP estimate for p 0} at the 
beginning of stage k (ie p = PMAp(k) is the parameter sample that maximizes L k (p) for 
any p £ Vk)- We want to choose Nk+i to be as large as possible provided we are still 
able to reliably compute orbits of the form {/^(^iV/JliJo" 1 * to shadow {yi}^^ 1 ■ 

Suppose that x^ k £ Uk(po 7 n). A reasonable measure of the number of iterates we 
can reliably compute for an orbit like {/' (^jv,.)}"^ * is given by the size of Uk(po } n). If 
Uk(po } n) is small, this implies that small changes or errors in initial state get magnified 
to magnitudes on the order of the measurement noise. Since we need to compute states 
to accuracies better than the measurement noise, it makes sense to pick N k +i so that 
Uk(po } Nk+i) is a few orders of magnitude above the precision of the computer. 



P(xN k \po, y\Nk, n]) because Uk{po, n) is generally much smaller than f Nk Nk ~ 1 (Uk-i(po, ^k))- 

81 



One complication that can arise, is that the sequence of states, {j/jvj., J/jv^+i, • • • }, 
might correspond to an especially parameter-sensitive stretch of points, so that there 
may be no orbit of fp MAP (k) that shadows the data, {j/ 8 }" = jv • I n this case, we cannot use 
the size of Uk(pMAp(k) } n) to determine Nk+i- Instead of using pMAp(k) pick the next 
best parameter sample in Vk, P'(k) } where p'(k) maximizes L^ k {p) for any p £ Vk, besides 
PMAp(k). We then try to play the same procedure with p' that we described for pMAp(k). 
Similarly, if f p i cannot shadow the data choose another parameter value from Vk, and 
so forth. Eventually some parameter value in Vk must work, or else either: (1) there are 
not enough parameter samples, or (2) p is not in the parameter space region specified 
upon entrance to the kth stage. This can be especially be a problem at the beginning of 
the estimation process when the parameters are not known well, and parameter samples 
are more sparse in parameter space. The solution is to choose parameters intelligently, 
choosing varying numbers of parameter samples in different regions of parameter space 
and in different situations (for example, to initialize the estimation routine). 

Combining data from stages 

As in the Kalman filter, we want to build a recursive algorithm so that data sum- 
marizing information for stages 1 through k — 1 can be combined with information 
from stage k to produce results which summarize all knowledge about stages 1 through 
k. Specifically, suppose that y[Nk 7 Nk+i] = (j/jv,., j/jv^+i, . . . , VN k+1 ) represents the state 
samples of the kth stage. We propose to compute Lk+i(po) using information given in 
Lk(po), P{xN k _ 1 \po-,y[Nk-i-,Nk])-, and P(x Nk} p \y[N k} N k+1 ]). Then all information about 
stages 1 through k can be represented by Lk+i(po) and P{xj^ k \p 0} y[Nk } Nk+i]). 

From (5.38) we see that Lk(po) depends only on P(xN k _ 1 ,p \y Nk ) evaluated on the 
orbit that best shadows the hrst Nk state samples. In other words if {x 8 |jv a .} 8 =o i s the 
best shadowing orbit based on the hrst Nk state samples, then from (5.38) and (5.35): 

Lk(po) = log[P(x Nk _ 1 =x Nk _ llNk}Po \y Nk )] 

I N k 
= K 2 + log[P( Po )] - - ^2(xi\ Nk - UifRf (x l{Nk - yi ). (5.39) 

z i=0 

One key thing to notice is that Uk-i(po } Nk) and Uk(po } Nk+i) should be very small 
compared to the measurement noise, i? 8 -, for any i. This is a reasonable assumption as 
long as none of the measurements have relative accuracies on the order of the machine 
precision. Therefore we can approximate Xi\w k with Xi\^ k for i £ {0, 1, . . . , Nk-\} in 
(5.39) and if we let: 

^ N k -! 

Ak(po) = log[P(p )] - - Y, (xi\N k+1 - yifR7 1 {x t \N k+1 - Ui) (5.40) 

8 = 

Then from (5.36), (5.39), and (5.40): 

L k {po) ~ Ak{po) + sup {logiPixN^lpo^iNk-uNk})}} (5.41) 
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and also: 

I N k 
L k+1 (p ) « A k (po)-- J2 (^AN k+1 -y i ) T R~ 1 (x llNk+1 -y,) 

Z i=N k _ 1 

+ sup {\og[P(x Nk \p ,y[N k ,N k+1 ])]}. (5.42) 

%N k £lx 

We can now evaluate (5.42) given the appropriate representations of Lk(p) } P{xj^ k _ 1 \p 0} y[Nk-i, Nk]), 
and P(xN k \p 0} y[Nk } Nk+i]). The term on the right hand side of (5.42) involving sup^, eI 
can be approximated from our representation of the density P{xj^ k \p 0} y[Nk, Nk+i]) by 
simply taking the maximum density value over all the trajectory samples. Likewise 
Ak(po) can be evaluated from (5.41) in a similar manner given Lk(po). The trajectory 
{&i\N k+1 }i=N _ can ^ e approximated by looking for trajectory sample x' G Uk-i(po, Nk) 
in the representation for P(xN k _ 1 \p , y[Nk-i, Nk]) that makes /„ *~ k ~ 1 (x') as close to 
Uk{po,N k+ i) as possible. Then let Xi\ Nk+1 = r v ~ Nk - 1 {x') for i G {N k _i, . . . ,N k }. 

Note that this assumes that Uk(po, N k +i) C fp k ~ Nk ~ 1 (Uk-i(po, Nk)). If this is not true 
then no orbit of f Po adequately shadows {j/ijj-Jo" 1 , and we can throw out the parameter 
sample p . 

Choosing new parameter samples and evaluating associated densities 

Once a parameter sample is deleted because it does not satisfy (5.37), a new parame- 
ter sample must be chosen along with the appropriate statistics and densities. We want 
choose new parameters after stage k so that they adequately describe Lk+i(p) over the 
surviving parameter range. In other words we attempt to choose new parameters to fill 
in gaps in parameter space where nearby parameter samples, pi and p 2} for example, 
have very different values of Lk+i(pi) and Lk+i(p2)- 

Once we choose the new parameter sample, p*, we need to evaluate the relevant 
statistics, namely Lk+i(p*) and P{x^ k \p = p*, y[Nk, Nk+i]). We could, of course, do 
this by going back through all of data {j/ijj-Jo" 1 and sampling the appropriate densities. 
This, however, would be quite time-consuming, and would likely not reveal much more 
information about the parameters than we could get by much simpler means, assuming 
that enough parameter samples are used. Instead, we interpolate Ak(p*) given Ak(p) 
for all valid parameter samples, p G Vk- We then compute P{xj^ k _ 1 \p 0} y[Nk-i, Nk]) and 
P( x N k |po, y[Nk } Nk+i]) by iterating trajectory samples. We can then evaluate Lk+i(p*) 
according to (5.42). 

Efficiency concerns 

This algorithm is not designed to be especially efficient. Rather, it is intended to 
try to extract as much information about the parameters of a one-dimensional map as 
reasonably possible. For a discussion of some performance issues, see the next section 
where we apply the algorithm to the family of quadratic maps. 
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One way to increase the efficiency of this algorithm would be to attempt to locate the 
sections of the data orbit that are sensitive to parameters, and perform the appropriate 
analysis only on these observations. For maps of the interval this corresponds to locating 
sections of orbit that pass near turning points. The problem, however, is not as obvious 
in higher dimensions. Rather than address this issue in a one-dimensional setting, in 
section 5.6 we will look at how this might be done in higher dimensional systems using 
linear analyses. 



5.6 Algorithms for higher dimensional systems 

In this section we develop an algorithm to estimate the parameters of general nonuni- 
formly hyperbolic systems. Suppose we are given a family of maps, f p : M — >■ Af, for 
p G I p and noisy measurement data, {?/„}, where: 

and y n = x n + v n 



where x n G M for all n, M is some metric space, and p G I p C IR such that f } 



Po 



IS 



nonuniformly hyperbolic. Suppose also that the v n 's are zero mean Gaussian independent 
random variables with covariance matrix, R n , and that we have some a priori knowledge 
about the value of p . Our goal in this section is to develop an algorithm to estimate p 
given {y n }. 

Like the algorithm for one-dimensional systems discussed in the last section, the 
estimation technique presented here is based on an analysis of probability densities using 
a Monte-Carlo-like approach. The idea, however, is to avoid the heavy computational 
burden typical of Monte Carlo methods by selectively choosing which pieces of data 
to fully analyze. Since most of the state data in a nonuniformly hyperbolic systems 
apparently do not contribute much information about the parameters of the system, the 
objective is to quickly bypass the vast majority of data, but still construct extremely 
accurate parameter estimates by performing intensive analyses on the small sections of 
data that really matter. 



5.6.1 Overview 

The parameter estimation algorithm has two primary components. The hrst component 
sifts through the data to locate orbit sections that might be sensitive. The second 
component performs an analysis on the parameter-sensitive data sections to determine 
the parameter estimate. 

The data is hrst scanned using a linear estimator like the square root extended 
Kalman filter. As described in chapter 4, linear analyses can indicate the presence of 

84 



degeneracy in the hyperbolic structure of a system. In the case of a recursive linear filter, 
degeneracies corresponding to parameter-sensitive stretches of data are indicated by a 
sharp drop in the covariance matrix of the estimate. We simply run the data through 
the appropriate filter, look for a drop in covariance estimate over a small number of 
iterates, and note the appropriate sections of data for further analysis. 

The second component of the estimation technique consists of Monte-Carlo-based 
technique. The underlying basis for this analysis is similar to what was described in 
section 5.5 for one-dimensional systems. Basically the estimate is constructed by using 
information obtained by sampling the appropriate probability densities in state and 
parameter space. There are, however, a few important differences to point out from the 
one-dimensional algorithm. First, since the systems are invertible, we iterate the map 
both forwards and backwards in time 12 in order to obtain information about probability 
densities. Also the higher dimensionality of the systems causes a few problems with how 
to represent and choose regions of state space in which to generate samples. Finally 
instead of concatenating consecutive stages by matching initial and final conditions of 
sample trajectories, we generate only one stage for each section of sensitive state data. 
The stages are separated in space and time, so there is no matching of initial and final 
conditions. 



5.6.2 Implementation 

In this section we detail some of the basic issues that need to be addressed in order to 
implement the proposed algorithm. 

Top-level scan filter 

The data is hrst scanned by a square root extended Kalman filter. The implementa- 
tion is straightforward: simply process the data and look for drops in the error covariance 
matrix. There are two parameters that may be adjusted: (f ) a parameter, iV, to set the 
number of iterates (time scale) to look for degeneracies, (2) a parameter, a, to set the 
threshold that governs whether a section of data is sent to the Monte-Carlo algorithm 
for further analysis, a is expressed in terms of a ratio of the square roots of the variances 
of the parameter error. 

Evaluating densities 

Let y n = (2/0,2/1, . . . , y n ). To estimate parameters, we are interested in densities of 



12 For lack of a better term we use "time" to refer to increasing iterations of the discrete map f p . For 
example applying f p to a state will sometimes be called mapping forwards in time and applying Z" 1 
will be referred to as mapping backwards in time. 
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the form P(x 0} p \y n ). From (5.36) we have that: 

log[P(x ,p \y n )] 

= log P(p ) + log[P(x \p , y n )} 

1 n 
= K 2 +logP(p ) - -E(/;W - VifR-'ifJxo) ~ Ui) (5.43) 

8 = 

where K 2 is a constant. 

Information about probability densities is obtained by sampling in state and parame- 
ter space. For a MAP estimator, we expect that the relative merit of various parameters 
samples, p 0} would be evaluated according to the formula: 

L{po\y n ) = sup log[P(x ,p \y n )] 

= log P(p ) + sup log[P(x \po,y n )] 

Xo&Ix 



1 n 

K 2 +logP(p ) - - sup E(f B N - yifR-HfJxo) - yi )}. 

!* 8 = 



^ XO^Ix 



In general, however, we will only consider a few sets of observations in the sequence, 
{yi}- For example, suppose that for any integer, n > 0, the linear filter has identified 
k(n) groups or stages of measurements that may be sensitive to parameters. Then for 
each j G {1, 2, . . . , k(n)} } define Y 3 = {yi\i G Sj} to be a set of sensitive measurements 
that have been singled by the linear filter, where the sets, Sj C2, represent the indices 
that can be used to identify the measurements. From our arguments in chapters 3 
and 4 we expect that most of the information about the parameters of the system can 
be extracted locally by looking at each group of measurements individually. Thus we 
consider the statistic, Lk^(p ) } as a replacement for L(p \y n ) where: 

k(n) 

Lk(n)(po) = K 2 + logP(p ) + Y^ SU P \°g[P( x o,Po\Yj)] 

j =1 x eYj 

i H n ) 
= K 4 (k(n)) + logP( Po ) - - £[sup {E(/; M " yifR-'ifJzo) ~ Vi)}} 

and K^ki^n)) depends only on k(n). 

As in the one-dimensional case, we eliminate parameter samples, p } that fail to 
satisfy a thresholding condition: L^ n ^{p) > sup , eV {Lk( n )(p')} ~ v 2 i° T some a > 
where Vk( n ) 1S the set of parameter samples at stage k(n). In practice, if Y 3 for j G 
{1,2..., k(n)} are really the main measurements sampling parameter-sensitive areas of 
local folding, then L^ n ^{p ) in fact mirrors L(p \y n ) } at least with respect to eliminating 
parameter values that are not favored. This is the most important property of Lk( n )(po) 
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with respect to parameter estimation, since, as in the one-dimensional case, we would 
like to choose the parameter estimate, p n , to reflect the extremum of the surviving 
parameter range where L(p \y n ) drops off rapidly. 

Stages 

Suppose that the linear filter decides that the data, { £/«•}, might be sensitive near iter- 
ate i = Nk- Given parameter sample, p , we begin to examinethe density, P{xj^ k \p , y[Nk~ 
n, Nk + n\), for increasing values of n by generating trajectory samples of the form 
Up ( x N k )}7=- n and evaluating: 

1 n 
log[P(x Nk \p , y[N k -n,N k + n])] = K-- £ (fJ x N k ) - y.-f^T 1 (&(***) ~ Vi) 

i= — n 

for some constant, K. As in the one-dimensional case, for each n we keep only trajectory 
samples, x^ k , that satisfy a thresholding condition like: 

log[P(x Nk \p , y[N k -n,N k + n])] 

> sup {\og[P{x Nk \p ,y[N k -n,Nk + n\)]}-a 2 (5.44) 

x Nk eM 

for some a > 0. As n is increased, we replace trajectory samples that have been thrown 
out for failing to satisfy (5.44) by trying new initial conditions chosen at random from 
a bounded region in state space which we will denote B (p , Nk,n). B (p , Nk,n) C M 
plays a role analogous to Uk(po, Nk+i) in the one-dimensional case, except that it is a 
multidimensional neighborhood instead of simply an interval. 

Representing sample regions 

Given a specific parameter sample, p , we now discuss how to choose trajectory 
samples. In particular we examine the proper choice of B (p 0} Nk 7 n) for n > 0. For 
any n > 0, the objective is to choose B (p 0} Nk 7 n) so that it is a reasonably efficient 
representation of the volume of space occupied by X (p 0} Nk } n) where X (p 0} Nk } n) C M 
is a bounded region in state space such that x £ X (p 0} Nk } n) satisfies (5.44). We want to 
choose a simple representation for B (p 0} Nk } n) so that B (p 0} Nk } n) is large enough that 
B (p 0} Nk, n) D X (p 0} Nk, n), but small enough so that if an initial condition x is chosen 
at random from B (p 0} Nk,n) then there is high probability that x £ X (p 0} Nk,n). We 
get an idea for what X (p 0} Nk,n) is by iterating old trajectory samples of the density, 
P( x N k \po, ]j[Nk — {n — 1), Nk + (n — 1)]), and deleting the initial conditions that do not 
satisfy (5.44). Based on these trajectory samples, we choose B (p 0} Nk, n) to be a simple 
parallelepiped enclosing the surviving initial conditions. As new trajectory samples are 
chosen by picking random initial conditions in B (p , Nk, n), we get a better idea about 
the geometry of X (p , Nk,n) and can in turn choose a more efficient B (p , Nk,n) to 
generate additional trajectory samples. 

In our implementation of the algorithm, B (p , Nk,n) is always represented as a 
box. This method has the advantage that it is extremely simple and also makes it 
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Figure 5.4: Here we illustrate why there can be multiple regions shadowing the same orbit. 
Near areas of folding, two regions, A and B, can be separate, yet can get asymptotically 
mapped toward each other both forwards and backwards in time. Note that in the picture, A 
and B are located at intersections of the same stable and unstable manifolds. This situation 
must be dealt with when sampling probability densities and searching for optimal shadowing 
orbits. 

Avoiding degenerate sample regions 
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The other problem is that X (p 0} Nk 7 n) tends to collapse onto a lower dimensional 
surface as n gets large. This is due to the fact that the map, /" , generally contracts and 
expands some directions in state space more than others. Our ability to compute orbits 
like {fp (x)yi__ n is related to the largest expansion factor of either /" or f~ n (e.g., the 
square root of Df™ (x) T Df™ (x)). If X (p , Nk,n) collapses onto a lower dimensional 
surface, that means that across the width of the surface of X (p 0} Nk, n), tiny differences 
in initial conditions get magnified to the level of the measurement noise by either /" 
or f~ n - For example, if /" is responsible for collapsing X (p 0} Nk 7 n) onto a surface 
with thickness comparable to the machine precision, then we cannot expect to choose 
trajectory samples of the form /' (x) for i > n without experiencing debilitating roundoff 
errors. 

Ideally, as n increases, we would like X (p 0} Nk 7 n) to converge toward smaller and 
smaller ball-shaped regions while maintaining approximately the same thickness in every 
direction. Besides having better numerical behavior than regions that collapse onto a 
lower-dimensional surface, it is also much easier to represent such regions and choose 
initial conditions inside these regions. 

There is a degree of freedom that is available and can be used to adjust the shape 
of the region where initial conditions are sampled. We can simply choose to iterate 
trajectory samples further backwards in time than forwards in time or vice- versa. In 
other words, if /I 1 expands one direction much more than f~ n expands any direction in 
state space then we may iterate orbits of the form {/' (^)}™=_ na where n a > rib. The 
relative sizes of n a and rib can then be adjusted to match the rates of convergence of the 
region where initial conditions are sampled. 

In practice it can be a bit tedious to adjust the number of iterates in sample trajec- 
tories and attempt to figure out what effect iterating forwards or backwards has on the 
shape of a particular region in state space. A better way to approach the problem is to 
examine regions of the form: 

X 3 (p ,N k ,n) = P po (X (p ,N k ,n)) 

for j G {— n, — n + l,...,n — l,n}. For any particular p 0} Nk, and n, if X (p 0} Nk } n) starts 
to become an inadequate region for choosing new sample trajectories, we simply search 
for a j so that the region, Xj(po, Nk } n), is not degenerate in any direction in state space 
(This process is described in the next section). We can then pick new initial conditions, 
x G Xj(po,Nk,n) and iterate orbits of the form {/' (x)}™Zi n _j in order to evaluate the 
proper densities. Note that instead of deleting sample trajectories according to (5.44), 
new sample trajectories are now thrown out if they fail to satisfy 

\og[P(x Nk -j\p ,y[N k - n,N k + n])] > sup {\og[P(x Nk _j \p , y[N k - n, N k + n])]} - a 2 . 

x Nk _ J eM 

This procedure is thus equivalent to sampling trajectories from X (p 0} Nk 7 n) } except 
that it is better numerically. 
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Evaluating and choosing new sample regions 

We now describe how to decide when an initial condition sample region like X l0 (p 0} Nk } n) 
has become inadequate and how to choose a new j* £ {— n, —n + 1, . . . , n — 1, n} so that 
X 1 *(p 0} Nk 7 n) makes an effective sample region. 

Basically, as long as we can pick B (p 0} Nk 7 n) so that most initial conditions, x, 
chosen from B (p 0} Nk 7 n) satisfy x £ X l0 (p 0} Nk, n), then things are satisfactory, and 
there is no need to search for a new sample region. However, suppose that it becomes 
difficult to choose x £ B (p 0} Nk 7 n) so that x £ X l0 (p 0} Nk } n). It might be the case 
that Xj (po, Nk, n) is collapsing in multiple directions, and we simply cannot increase n 
without running into numerical problems. If this is not the case, then we hrst search 
for whether X l0 (p 0} Nk, n) can be divided into two separate high density regions. If so, 
then we concentrate on one of these regions. Otherwise we have to search for a new 
j* £ {— n, —n + 1, . . . , n — 1, n} and a new sample region, X 1 *(p 0} Nk, n). 

This is done in the following manner. We take the trajectory samples marking 
the region, X l0 (p 0} Nk, n), and iterate them forwards and backwards in time looking at 
samples of 

X 3 (p ,N k ,n) = P p -^(X 30 (p ,N k ,n)) 

for j £ {— n + jo, — n + jo + 1? • • • ? n + jo}- We would like to pick j* to be a value for 
j such that Xj(po, Nk, n) is not degenerate, so that it is easy to pick B (p 0} Nk } n) such 
that x £ B (p 0} Nk } n) implies x £ Xj(po, Nk } n) with high probability. 

We would also like to pick j* so that X 1 *(p 0} Nk 7 n) is a well balanced region and 
is not degenerate in any direction. The hrst thing to check is to simply generate the 
box, Bj(po, Nk } n), enclosing Xj(po, Nk } n) for each j and make sure that none of its side 
lengths are degenerate. This condition is not adequate, however, since one could end 
up with a j* in which X 1 *(p 0} Nk 7 n) is actually long and thin but curls back on itself 
so that its bounding box, Bj(po, Nk } n), is not long and thin. In order to check for this 
case, one thing to do is to partition the box, Bj(po, Nk } n), into a number of subregions 
and check to see how many of these subregions are actually occupied by the trajectory 
samples demarking Xj(po, Nk,n). If very few subregions are occupied then we have to 
reject j as a possible choice for j*. An adequate choice for j* can then be made using this 
constraint along with information about the ratio of the side lengths of Bj(po, Nk, n). 
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Chapter 6 



Numerical results 



In this chapter we present results from various numerical experiments. In particular, we 
demonstrate the effectiveness of the algorithms proposed in chapter 5 for estimating the 
parameters of chaotic systems. 

The algorithms are applied to four different systems. The first system, the quadratic 
map, is the same one-dimensional system that was examined in chapter 3 of this report. 
The second system we look at is the Henon map, a dissipative two-dimensional mapping 
with a strange attractor. The third system is the standard map, an area-preserving map 
that exhibits chaotic behavior. Finally in contrast to the hrst three systems, which are 
all nonuniformly hyperbolic, we also take a brief look at the Lozi map, one of the few 
nonpathological examples of a chaotic map exhibiting uniformly hyperbolic behavior. 

We find that with the exception of the Lozi map, the other maps in this chapter 
all exhibit asymmetrical shadowing behavior on the parameter space of the map. Fur- 
thermore, this asymmetrical behavior always seems to favor one direction in parameter 
space regardless of locality in state space. 

Note that many of the basic comments and explanations applicable to all the systems 
are included in section 6.1 on the quadratic map, where the issues are hrst encountered. 



6.1 Quadratic map 

In this section we describe numerical experiments on the quadratic map: 

f p (x) = px(l - x) (6.1) 

where x £ [0, 1] and p £ [0,4]. For values of p between 3.57 and 4.00, numerical exper- 
iments suggest that there are a large number of parameter values where (6.1) exhibits 
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chaotic behavior. In particular we will concentrate on parameters near p = 3.9. For 
Po = 3.9, numerical results indicate that f Po has a Lyapunov exponent of about 0.49. 

Let us begin by presenting a summary of our results for one particular orbit of the 
quadratic map, the orbit with initial condition x = 0.4. These results are summarized 
in figure 6.1. Our discussion in this section will seek to answer the following questions: 
(1) what each of the lines in figure 6.1 mean, (2) why each of the data sets graphed has 
the behavior shown, and (3) what we expect the asymptotic behavior for each of the 
traces might be if the simulations were continued for higher numbers of data points. 
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Figure 6.1: This graph summarizes results related to estimating the parameter p in the 
quadratic map for data generated using the initial condition x = 0.4. 
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6.1.1 Setting up the experiment 

In order to test parameter estimation algorithms numerically, we first pick a parameter 
value, p and generate a sequence of data points {j/ 8 }™ =0 , to represent noisy measurements 
of f po . This is done by choosing an initial condition, x 0} and numerically iterating the 
orbit {x{ = j l (x )}™_ . The noisy measurements, {j/i}" =0 , are then simulated by setting 
]Ji = Xi + Vi where the v^s are randomly generated values for i £ {0,1,..., n}. For 
the experiments in this section, the v^s are chosen to simulate independent identically 
distributed Gaussian random variables with standard deviation 0.001. 

We then use the simulated data, {j/i}" =0 , as input to the parameter estimation al- 
gorithm to see whether the algorithm can figure out what parameter value was used to 
generate the data in the hrst place. In general the parameter estimation algorithm may 
also use a priori information like an initial parameter estimate along with some measure 
of how good that estimate is. In this chapter we generally choose the initial parameter 
estimate to be a random value within .025 of p . 



6.1.2 Kalman filter 

Let us now examine what happens when we apply the square root extended Kalman 
filter to the quadratic map. We investigate the Kalman filter for data generated from 
four different initial conditions: x = {0.1,0.2,0.3,0.4}. 

Figure 6.2 illustrates perhaps the most important feature of the simulations, namely 
that the Kalman filter eventually "diverges." Each trace in figure 6.2 represents the 
average of ten different runs using ten different sets of numerically generated data from 
each initial condition. On the y— axis we plot the ratio of the actual error of the pa- 
rameter estimate versus the estimated mean square error obtained from the covariance 
matrix of the filter. If the filter is working, we generally expect this ratio to be close 
to f. Note also that the filter seems to start fine, but then the error jumps to many 
"standard deviations" of the expected error and never returns to the normal operating 
range. 

In fairness, plotting an average can be somewhat misleading because the average 
might be skewed by outliers and runs that fail massively. There are in fact significant 
differences from run to run. However, numerous experiments with the Kalman filter 
suggest that divergence pretty much always occurs if one allows the filter to run long 
enough. In addition, none of the standard techniques for addressing divergence difficul- 
ties seem to be able to adequately solve the problem (eg, exponential forgetting of data). 
It seems that one is stuck with either letting the filter diverge, or somehow decreasing 
confidence in the covariance matrix so much that accurate estimates cannot be attained. 

In figure 6.3 we plot the actual error of the Kalman filter versus number of state 
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Figure 6.2: This figure shows results for applying the square root extended Kalman filter to 
estimating the parameters of the quadratic map with p = 3.9. Each trace represents the average 
ratio of the actual parameter estimate error to the estimated mean square error as calculated 
by the Kalman filter over 10 different trials. The different traces represent experiments based 
on orbits with different initial conditions. Note how the error jumps up to levels on the order 
of 10 or higher, indicating divergence. 
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samples used on a log-log scale. Again the errors plotted are the average of the errors of 
ten different runs. We see that the error makes progress for a little while but then diver- 
gence occurs. The Kalman filter rarely makes any real progress after divergence occurs, 
not even exhibiting the -4^= improvement characteristic of purely stochastic convergence 
(ie, the filter is not getting any information from the dynamics), since the over- confident 
covariance matrix prohibits the parameter estimate from moving much unless the state 
data drifts many deviations away from what the filter expects. 1 



6.1.3 Analysis of proposed algorithm 

We now examine the performance of the algorithm presented in section 5.5. The results 
in this section reflect an implementation of the algorithm based on 9 samples in param- 
eter space and 50 samples in state space (250 when representations for different stages 
are being combined). Each stage is iterated until the state sample region is of length 
f X f0~ 9 or less. We use a = 8 so that the sample spaces in state and parameters are 8 
deviations wide. 

One of the most striking things about the results of the algorithm is the asymmetry 
of the merit function, L(p), in parameter space. As shown in figure 6.4, the parameter 
merit function typically shows a very sharp dropoff on the low end of the parameter 
space. Based on this asymmetry we choose the parameter estimate to be the parameter 
value at which the sharp dropoff in L(p) occurs. 

In figure 6.5 we see the performance of the algorithm on data based on the initial 
conditions, x £ {0.1,0.2,0.3,0.4}. Each trace in the figure represents one run of the 
algorithm. Rerunning the algorithm multiple times on data based on the same initial 
condition produces similar results, except that the scanning linear filter sometimes defers 
a few more or less points to the Monte Carlo estimator for analysis. 

Note how the error in the estimate tends to converge in sudden large jumps over small 
numbers of iterates, while staying approximately constant in between these jumps. The 
large decreases in error level occur when the data orbit makes a close approach to the 
turning point, causing a stretch of state samples to become sensitive to parameters. 
This is not simply a product of discretization in the algorithm, since the Monte Carlo 
estimator sometimes makes no gains at all, while other times great gains are made, and 
a large number of parameter samples are deleted on the lower end of the parameter 
sample range. 

One might wonder how this graph would look like if we were to extend it for arbitrarily 
many iterates. Consider the theory presented in chapter 3. First of all, it is likely 



1 Interestingly, this actually does occur, apparently near areas of folding, since the filter models the 
folding phenomena so poorly. Occasionally this can even cause the filter to get back in sync, moving 
the parameter estimate just the right amount to lower the error. This seems to be quite rare, however. 
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Figure 6.3: Graph of the average error in the parameter estimate as computed by square 
root extended Kalman filter applied to the quadratic map with parameter value p = 3.9. Data 
represents average error over 10 runs. 
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Figure 6.4: Asymmetry in the parameter space of the quadratic map: Here we graph the 
parameter merit function L(p) after processing 2500 iterates of an orbit with initial condition 
x = 0.4. The merit function is normalized so that L(p) = at the maximum. Since a = 8, 
a parameter sample, p, is deleted if L(p) < —64. This sort of asymmetrical merit function is 
typical of all orbits encountered in the quadratic map, Henon map, and standard map. 

that f po satisfies the linking condition, and therefore exhibits a parameter shadowing 
property. This means there is essentially an end to the progress that can be made in 
the estimate based on dynamical information, after which stochastic convergence would 
be the rule. However, there is evidence that the level of accuracy at which this effect 
becomes important is probably many, many orders of magnitude smaller from the level 
we are dealing with. 2 

This leads us to ask: assuming that we do not see the effects of parameter shadowing, 
how does the parameter estimation accuracy converge with respect to n, the number of 
state samples processed by the algorithm? As conjectured in section 3.5, we believe that 
the accuracy converges at a rate proportional to \. A line with a slope of -2 is drawn 
in figure 6.5 to suggest the conjectured asymptotic behavior. Note that the conjecture 
seems plausible from the picture, although more data would be needed to really make 
the evidence convincing. 

In figure 6.6 we show the error in the upper bound of the parameter range being con- 
sidered by the algorithm. While the lower bound of this range is used as the parameter 
estimate, the upper bound has significantly different behavior. After an initial period, 
the convergence of the upper bound is governed purely by stochastic means (ie, without 
any help from the dynamics). This is predicted by Theorem 3.4.2. Thus we expect that 



2 It is difficult to calculate this directly, since it requires knowing the exact number of iterates it takes 
an orbit from the turning point to return near the turning point. However, rough calculations suggest 
that for most parameters around po = 3.9 we expect that parameter shadowing would not be seen until 
parameter deviations are less than 1 x 10 -50 for noise levels of 0.001. 
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Figure 6.5: Graph of the actual error in the parameter estimate of the proposed algorithm 
when applied to data from the quadratic map with p = 3.9. A line of slope -2 is drawn on the 
graph to indicate the conjectured asymptotic rate of convergence for the estimate. 
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Figure 6.6: Graph of the error in the upper bound of the parameter range being considered 
by the proposed algorithm for the quadratic map with p = 3.9. A line with a slope of — | is 
drawn to indicate the expected asymptotic convergence of the error. 
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the convergence will be on the order of -4=, as suggested by the line with a slope of — | 
as shown in the figure. The small jumps in the graphs for figure 6.6 are simply the result 
of the discrete nature of how parameter space is sampled. 



6.1.4 Measurement noise 

One other important question to ask is, what happens if we change the level of mea- 
surement noise? The short answer is that the parameter estimate results presented here 
are surprisingly insensitive to measurement noise. If we ignore the parameter shadow- 
ing effects caused by close returns to the turning point (which we have already argued 
are negligible for our experiments), then shadowing of any finite orbit is really an all 
or nothing property in parameter space. Consider a stretch of state orbit with initial 
condition x close to the turning point. Then for a parameter value in the unfavored 
direction, either the parameter value can shadow that stretch of orbit (presumably with 
initial condition closer to the turning point than x ) } or the parameter value cannot 
shadow the orbit, in which case it loses track of the original orbit exponentially fast. 
Asymptotically, the measurement noise actually makes no difference in the parameter 
estimate other than through parameter shadowing effects caused by linking. Thus, once 
the measurement noise is lower than a certain level, the actual measurement noise makes 
very little difference in the accuracy of parameter estimates. 

Measurement noise does have a large affect on figure 6.6, the upper parameter bound, 
and the possibility of parameter shadowing caused by linking. If the measurement noise 
is large, then there is likely to be more parameter shadowing effects caused by linking. On 
the other hand, if the measurement noise is really small, then the asymmetrical effect in 
parameter space will in fact get drowned out for quite a while (until the sampled orbit 
comes extremely close to the turning point). In most reasonable cases however, the 
asymmetry in parameter space is likely to be quite important if we want to get accurate 
parameter estimates for reasonably large data sets. 



6.2 Henon map 

We now discuss numerical experiments with the Henon map: 

x n+1 = y n + 1 - ax\ (6.2) 

y n+1 = bx n (6.3) 

where the state (x n} y n ) £ IR 2 and the parameter values, a and 6, are invariant. For 
parameter values a = 1.4 and b = 0.3, numerical evidence indicates the existence of a 
chaotic attractor as shown in figure 6.7. See Henon [27] for a more detailed description 
of the basic properties of Henon map. 
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Figure 6.7: The Henon attractor for a = 1.4, b = 0.3. 

For the purposes of testing out parameter estimation algorithms, we fix b = 0.3 and 
attempt to estimate the parameter, a. State data is chosen from an orbit on the attractor 
of the Henon map. Noisy measurement data is generated using a state orbit and adding 
Gaussian noise with standard deviation 0.001 to each state value. 

Applying the square root extended Kalman filter to an orbit on the attractor results 
in figure 6.8. Observe that the filter diverges after about 15,000 iterates and does not 
recover. Note that the figure represents data for only one run. However, the results in 
figure 6.8 are representative for other sequences of data that we have tried. Although 
the performance of the Kalman filter is quite sensitive to noise, the key point is that 
divergence inevitably occurs, sooner or later, and the performance of the filter is generally 
unreliable. 

Note in figure 6.8 that the expected mean square error of the Kalman filter tends to 
change suddenly in jumps. In most cases these jumps probably correspond to sections 
of orbits that are especially sensitive to parameters because of folding in state space. 
The Kalman filter has a tough time handling the folding and typically divergence occurs 
during one of these jumps in the mean square error. This phenomenon is also apparent 
in figure 6.12. Note also that even after divergence, the parameter estimate sometimes 
changes by many standard deviations, indicating that the state space error residual must 
have been many deviations off. This again reflects the fact that the Kalman filter does 
not model folding well. 

We now apply the algorithm described in section 5.6. We choose to examine the top- 
level scan filter every 20 iterates or so looking for covariance matrix drops of around a 
factor of .7 or less. The algorithm is relatively insensitive to changes in these parameters 
so their choice is not particularly critical. 
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Figure 6.8: This graph depicts the performance of the Kalman filter in estimating parameter 
a for one sequence of noisy state data from the Henon map for a = 1.4 and b = 0.3. The data 
was generated using the initial condition, (x , y ) = (.633135448, 18940634), which is very close 
to the attractor. 
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Figure 6.9: Asymmetry in the parameter space of the Henon map (with a = 1.4, b = 0.3): 
Here we graph the parameter merit function L(a) after 200000 iterates of an orbit with initial 
condition on the attractor near x = (.423, .208). Note that this merit function is actually 
based on only the most sensitive 931 data points, since the linear filter threw out over 199,000 
points. 

As in the quadratic map, we find that the parameter merit function, L(a), is asym- 
metrical in parameter space. Specifically, L(a) always has a sharp dropoff in its lower 
bound, indicating that the Henon map favors higher parameters for parameter a (see 
figure 6.9). This property seems to be true for any orbit on the attractor. It also seems 
to be true for all the parameter values of the Henon that have been tried. We thus take 
advantage of the asymmetry in parameter space in order to estimate the parameters of 
the system. 

Figure 6.10 shows the estimation effort for data generated from several different 
initial conditions on the attractor. The tick marks on the traces of the graph denote 
places where the top level scan filter deferred to the Monte-Carlo analysis. Note that 
as with the quadratic map, improvements in the estimate seem to be made suddenly. 
Because relatively few numbers of points are analyzed by the Monte-Carlo technique, 
and because the state samples scanned by the Kalman filter do not contribute to the 
parameter estimate, almost all the gain in parameter estimate must have been made 
because of the dynamics. 
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Figure 6.10: Graph of the actual error of the parameter estimate for a using the proposed 
algorithm on the Henon map (with a = 1.4 and b = 0.3). This graph contains results for 
four different sets of data corresponding to four different initial conditions, all chosen on the 
attractor of the system. The tick marks on each trace denote places where the top level Kalman 
filter deferred to a Monte-Carlo-based approach for additional analysis. 
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6.3 Standard map 

We now discuss numerical experiments with the standard map: 

x n+ i = (x n + y n + K sins) mod 2tt 
y n+1 = (y n + Ksinx) mod 2tt 



(6.4) 
(6.5) 



where K is the parameter of the system and the state, (x n} y n ) £ T 2 , lives on the 2- 
torus, T 2 . The standard map is a Hamiltonian (area-preserving) system, and thus does 
not have any attractors. Instead, for example, for K = 1, there is apparently a mixture 
of invariant tori and seas of chaos where non-periodic orbits wander around. This is 
illustrated in figure 6.11. See Chirikov [13] for more discussion on the properties of the 
standard map. 




Figure 6.11: This picture shows various orbits of the standard map near K = 1. Note that 
since the space is a torus, the sides of the square are actually overlapping. This picture shows 
a number of different orbits. Some orbits fill out dark zones of chaotic behavior, while others 
remain on circular tori. 

In order to test the parameter estimation technique, we picked K = 1 and generated 
data based on orbits chosen to be in a chaotic region. To each state, we added random 
Gaussian measurement noise with standard deviation 0.001 to produce the data set. The 
results of applying the square root extended Kalman filter are shown in figure 6.12. As 
in the quadratic map and Henon map, we see that the Kalman filter diverges. 

In figure 6.14 we show the result of applying the algorithm in section 5.6 to the 
standard map. In particular we investigate data for five different initial conditions in 
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Figure 6.12: This graph depicts the performance of the square root extended Kalman filter 
for estimating parameter K using one sequence of noisy state data from the standard map 
with K = 1. The data was generated using the initial condition, (x ,y ) = (0.05,0.05). This 
initial condition results in a trajectory that wanders around in a chaotic zone. 
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Figure 6.13: Asymmetry in the parameter space of the standard map (with K = 1): Here 
we graph the parameter merit function L(K) after 250000 iterates of an orbit with initial 
condition x = (.423, .208). 

the chaotic zone. In figure 6.13 we see the effects of asymmetric shadowing in the 
standard map. The algorithm used in these trials is exactly the same as the one used for 
the experiments with the Henon map (not even the tunable parameters of the algorithm 
were changed). This indicates that the algorithm is relatively flexible and does not have 
to be tuned precisely to generate reasonable results. 



6.4 Lozi map 



We now discuss numerical experiments with the Lozi map: 



x n+l 
Un+1 



Vn + 1 

bx n 






(6.6) 

(6.7) 



where the state 



K %m IJn 



and the parameter values, a and 6, are invariant. The 



Lozi map may be thought of as a piecewise linear version of the Henon map. Unlike 
the Henon map, however, the Lozi map is uniformly hyperbolic where the appropriate 
derivatives exist ([36]). For parameter values a = 1.7 and b = 0.5, the Lozi map has a 
hyperbolic attractor ([36]) as shown in figure 6.15. 

For the purposes of testing out parameter estimation algorithms, we fix b = 0.5 and 
attempt to estimate a. State data is chosen from an orbit on the attractor of the Lozi 
map. 

In figure 6.16 we show the result of applying a square root extended Kalman filter 
to the Lozi map. Unlike with the quadratic, Henon, and standard maps, the Kalman 
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Figure 6.14: This graph depicts the performance of the proposed algorithm for estimating 
parameter K using one sequence of noisy state data from the standard map with K = 1. 
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Figure 6.15: The Lozi attractor for a = 1.7, b = 0.5. 

filter applied to the Lozi map shows no signs of divergence, at least within 100,000 
iterates. Note that the convergence of the expected mean square parameter estimation 
error falls almost exactly at the -4^ rate indicated by pure stochastic convergence. Thus, 
the dynamics makes no asymptotic contribution to the parameter estimate, as one would 
expect with a uniformly hyperbolic system. 

We cannot really apply the algorithm from section 5.6 to the Lozi map because there 
are basically no sensitive orbit sections to investigate. The whole data set would pass 
right through the top level scanning filter without further review. However, even if we 
did force the Monte-Carlo algorithm to consider all the data points, we should again 
find purely stochastic convergence. 
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Figure 6.16: This graph plots the performance of a square root extended Kalman filter in 
estimating the parameter, a, in the uniformly hyperbolic Lozi map. The data here represents 
the average over five runs based on data with different measurement noises bit generated 
using the parameters a = 1.7, b = 0.5, and the same initial condition on the attractor, near 
(xo, 2/ ) = ( — .407, .430). Note the lack of divergence, and the fact that convergence is purely 
stochastic. 
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Chapter 7 



Conclusions and future work 



7.1 Conclusions 

This report examines how to estimate the parameters of a chaotic system given obser- 
vations of the state behavior of the system. This problem is interesting in light of recent 
efforts to use chaotic systems for control and signal processing applications, and because 
of the possibilities for using parameter estimation in chaotic systems to develop ex- 
tremely sensitive measurement techniques. In order to evaluate the possible application 
of parameter estimation techniques to chaotic systems, we approached this report with 
two main goals in mind: (1) to examine the extent to which it is theoretically possible 
to estimate the parameters of a chaotic system, and (2) to develop an algorithm to do 
the parameter estimation. Significant progress was made on both objectives. 

7.1.1 Theoretical considerations 

In order to examine the theoretical possibilities of parameter estimation, we hrst broke 
chaotic systems down into two categories: structurally stable systems and systems that 
are not structurally stable. Structurally stable systems are probably not that interesting 
for measurement applications, since small perturbations in the parameters of these sys- 
tems do not result in qualitatively different state orbits. Consequently, we cannot extract 
asymptotic information about the parameters by observing the dynamics of structurally 
stable systems. 

The situation, however, is significantly different for systems that are not structurally 
stable. It turns out that the accuracy of parameter estimates is closely related to how 
orbits shadow each other for systems with slightly different parameter values. Thus, 
investigating the possibilities for parameter estimation required us to examine shadowing 
orbits. We discovered two interesting properties of shadowing orbits for parameterized 
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families of nonuniformly hyperbolic systems. First, we found that there is often an 
asymmetrical shadowing behavior in the parameter space of these systems. That is, for 
one-parameter families of systems, it is typically much easier for systems with slightly 
higher parameter values to shadow orbits of systems with slightly lower parameter values 
(or vice versa). To illustrate this property in at least one case, we proved a specific 
shadowing result showing there truly is a preferred direction in parameter space for 
certain maps of the interval with negative Schwarzian derivative satisfying a Collet- 
Eckmann-like condition for state and parameter space derivatives. 

In addition, we also found that given a typical orbit of a nonuniformly hyperbolic sys- 
tem, most iterates of the orbit look locally hyperbolic, so that only a few rare stretches 
of the orbit are sensitive to parameters and exhibit the asymmetrical shadowing behav- 
ior in parameter space. These sensitive stretches of orbit seem to correspond to local 
nonhyperbolic folding behavior in state space. 



7.1.2 Parameter estimation algorithms 

In designing the new parameter estimation algorithm, we took advantage of the two 
theoretical observations described above. First, since most of the state data is apparently 
insensitive to parameter changes, we chose a fast top-level filter to scan through the 
data before concentrating on data that might be especially sensitive. The observation 
about asymmetrical shadowing behavior in parameter space is also extremely important, 
since it means that we have only to investigate the sharp boundary in parameter space 
between parameters that do and do not shadow the data in order to estimate what the 
true parameters are. 

The resulting algorithm is shown to perform significantly better than standard pa- 
rameter estimation algorithms like the extended Kalman filter. The extended Kalman 
filter typically diverges for most problems involving parameter estimation of chaotic 
systems. That is, the filter's covariance matrix becomes too confident about the es- 
timation error, effectively fixing the parameter estimate to an incorrect value without 
accepting new information from additional data points. This occurs because most of 
the information about the parameters of the system can be derived from observations 
that experience local folding in state space, a phenomenon that is inherently difficult to 
model with the local linearization techniques used by the Kalman filter. 

Our algorithm, on the other hand, does not have the divergence problem of the 
extended Kalman filter. In several numerical experiments we demonstrated that the 
algorithm described in this report achieved accuracies at least 3 to 4 orders of magnitude 
better than the extended Kalman filter before the experiment was stopped. Presumably, 
we should be able to get even better accuracies with the proposed algorithm simply by 
using more data points. Meanwhile, the divergence problem places a fairly strict bound 
on the accuracy of the extended Kalman filter. 

112 



Furthermore, it appears that the estimation accuracy of the proposed algorithm 
converges at a rate of \ for certain systems (where n is the number of state samples 
processed). This is interesting because it is significantly better than the -4^ stochastic 
convergence one might typically expect from most nonchaotic or structurally stable 
systems. This indicates that the chaotic dynamics of a system can indeed help parameter 
estimation to some extent, and opens the door to some interesting possible applications 
like high precision measurement. 



7.2 Future work 



7.2.1 Theory 

Many questions still remain unanswered. First of all, I would like to know how to really 
characterize the ability of a system to shadow other systems. Is there a simple set 
of properties of a parameterized family of mappings that guarantee the asymmetry in 
parameter space shadowing behavior for a large class of mappings? How widespread 
is this asymmetrical behavior in parameter space shadowing? It seems likely that the 
situation is "generic" in some sense, but how can we make this statement more concrete? 

Shadowing is particularly not well understood in higher dimensional systems. It 
might be helpful to further investigate the invariant manifolds of nonuniformly hyper- 
bolic systems in order to better understand shadowing results. In particular, it would 
be interesting to investigate more quantitative results concerning the folding behavior 
observed in this report and to specify how this phenomenon affects shadowing behavior 
in general. 

There is also work to be done in figuring out exactly what the rate of convergence 
is likely to be for parameter estimation algorithms, in particular when those algorithms 
are applied to multi-dimensional nonuniformly hyperbolic systems. This is important if 
we would like to choose a system to optimize for parameter sensitivity. The conjectures 
of section 3.5 seem to be a good place to start. 



7.2.2 Parameter estimation algorithms 

There are a number of ways in which the parameter estimation algorithm could probably 
be improved. For instance, the biggest problem now seems to be in the behavior of the 
top-level scanning Kalman filter. Is there a better way of detecting where the parameter- 
sensitive stretches of data occur? Perhaps a better solution would be to use some sort 
of fixed-lag smoother so that data is taken from both forwards and backwards in time 
in order to smooth out local stretches of parameter-sensitive data. 



113 



Also, is there a nicer way of representing the state-parameter space probability den- 
sities? It is clear that linear representations like those in the extended Kalman filter 
cannot do the job. I have tried a number of other representation forms without success, 
and eventually resorted to a Monte-Carlo based method. Perhaps a more efficient but 
still effective representation form for the densities can be found. 



7.2.3 Applications 

Most importantly, there are still questions about how to apply parameter estimation 
in chaotic time series to problems like high precision measurement, control, or other 
possible applications. This report shows that many chaotic systems exhibit some special 
properties that would aid someone who is interested in knowing the parameters of a 
system based on state data. Now that we have a better theoretical base for understand- 
ing what factors affect parameter estimation in chaotic systems, it should be easier to 
understand how and when to apply the resulting algorithmic tools. 

As for the possibility of high precision measurement applications, this idea certainly 
merits additional research in light of the results in this report. The main problem here 
would be to find a suitable application where the quantity to be measured is physically 
interesting and the chaotic system involved satisfies all the right properties. For instance, 
this technique would ideally be applied to a system that is well-modeled by a relatively 
simple set of equations. The problem would be to find a suitable setup that would 
make the application worthwhile, and/or to increase the sophistication of the parameter 
estimation algorithms to handle a larger set of experimental situations. 
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Appendix A 

Proofs from Chapter 2 



This appendix contains notes on three proofs from Chapter 2. Note that in the hrst two 
theorems (sections A.l and A. 2), we reverse the names of the functions / and g from 
the corresponding theorems in the text of this report. This is done to conform with the 
notation used in Walters' paper, [62]. The notation in the appendix is the same as in 



Walters, while the notation in the text is switched. 



A.l Proof of Theorem 2.2.3 

Theorem 2.2.3: (Walters) Let f : M — >■ M be an expansive diffeomorphism with the 
pseudo-orbit shadowing property. Suppose there exists a neighborhood, V C Diff 1 (Af) of 
f that is uniformly expansive. Then f is structurally stable. 

Proof: This is based on theorem 4 and 5 and the remark on page 237 in [62]. In theorem 
4, Walters states that an expansive homeomorphism with the pseudo-orbit shadowing 
property is "topologically stable." However, Walters' definition of topological stability is 
weaker than our definition of structural stability. In particular, for topological stability 
of /, Walters requires that there exist a neighborhood, U C Diff 1 (M) } of / such that for 
each g £ U, there is a continuous map h : M — >■ M such that hg = fh. For structural 
stability, this h must be a homeomorphism. We can get the injectiveness of h from 
the uniform expansiveness of nearby maps (apply theorem 5 of [62]). We can get the 
surjectiveness of h from the compactness of M based on an argument from algebraic 
topology (see Lemma 3. II in [38], page 36). Since M is compact, and h is injective and 
surjective, h must be a homeomorphism. 
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A. 2 Proof of Theorem 2.2.4 

Theorem 2.2.4: Let f : M — >■ M be an expansive diffeomorphism with the function 
shadowing property. Suppose there exists a neighborhood, V C Diff x (Af) of f such that 
V is uniformly expansive. Then f is structurally stable. 

Proof: The proof given here is similar to theorem 4 of [62] except that the effective roles 
of / and g are reversed (where g denotes maps near / in Diff 1 (M)). Instead of knowing 
that all orbits of nearby systems can be shadowed by real orbits of / (pseudo-orbit 
shadowing), here we are given that all orbits of / can be shadowed by real orbits of any 
nearby system (function shadowing). 

We shall prove that there is a neighborhood U C V of / in Diff 1 (M) such that for 
any g £ U, there exists a continuous h such that hf = gh (note that the h we use here 
is the inverse of the one in theorem 2.2.3). From this result we can use the arguments 
outlined for theorem 2.2.3 to show that h is a homeomorphism because of the uniform 
expansiveness of / and the compactness of M. 

First we need to show the existence of a function h : M — >■ M such that hf = gh. 
From the function shadowing property, given any e > 0, there exists a neighborhood, 
U t C V of / such that any orbit of / is e— shadowed by an orbit of g £ U t . 

Now suppose that e < | inf 3G y e(g). In this case, we claim that there is exactly 
one orbit of g that e— shadows any particular orbit of /. If this were not true then two 
different orbits of g, {x n } and {?/„}, must shadow the same orbit of /. But because of 
the expansiveness of g there must exist an integer, iV, such that d(x^ } j/jv)) > 2e, so that 
{x n } and {y n } clearly cannot e-shadow the same orbit of /. Thus we can see that there 
must be a function h which maps each orbit of / to a shadowing orbit of g. 

Consequently, for any e > 0, there exists a neighborhood U t such that for any g £ U e , 
we can define a function h such that hf = gh and: 

sup xe Md(h(x),x) < e. (A.l) 

We now need to show that this h is also continuous. To do this we hrst need the following 
lemma from [62] : 

Lemma A. 2.1 (Lemma 2 in [62]) Let f be expansive with expansive constant e(/) > 0. 
Given any 6 > 0, there exists N > I such that d(f n (x) } f n (y)) < e(/) for \n\ < N 
implies d(x } y) < 6. 

Proof of Lemma: Given 8 > 0, suppose that the lemma is not true so that no such 
N can be chosen. Then there are exists a sequence of points, {x 8 }^ 1 and {j/ 8 }^ 1 (not 
orbits), such that for any N > 1, J(xjv,J/Af) > 8 and d{f n {x^), f n {y^)) < e(/) for all 
|n| < N. There exists a subsequence of points {x ni }^l and {y ni }iZo sucri that x ni — > x 
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and y ni — > y as i — > oo such that d(x } y) > 8. By continuity of / this implies that 
d(f n (x) } f n (y)) < e(/) for all n, which is a direct contradiction of the expansiveness of 
/. This completes the proof of lemma A.2.1. 

Returning to the proof of theorem 2.2.4, we now want to show the continuity of h. In 
other words, given any a > we need to show there exists a 8 > such that d(x, y) < 8 
implies d(h(x) } h(y)) < a. 

Our strategy is as follows: Since g is expansive, from lemma A.2.1 we know that 
for any a > we can choose N a such that if d(g n (h(x)) , g n (h(y))) < e(g) for |n| < N a 
then d(h(x) } h(y)) < a. Thus suppose that for any a > there exists 8 > such that 
d(x, y) < 8 implies d(g n (h(x)) , g n (h(y))) < e(g) for all |n| < N a . Then d(h(x) } h(y)) < a, 
and h must be continuous. This is what we shall show. 

Given a > 0, pick 8 > such that d(f n (x), f n (y)) < 8 if |n| < A„. Set e(V) = 
sup ge ve(g) and hx e = |e(V). From equation ( A.l) we know that given this e > 0, 
there exists a neighborhood, U t C V, of / in Diff 1 (M) such that for any g £ U e , 
there exists h such that hf = gh and sup xe Md(h(x) } x) < e. Thus for any g £ U t and 
corresponding /j : M — > Af, if J(x, y) < e then we have: 

d(g n (h(x)),g n (h(y))) = d(h(P (x)) ,h(f \y))) 

< d(h(r(x)), r(x)) + d(r(x), r( y )) + d(r(y), Mr (?/))) 

< e(V) < e(g) for all |n| < N a 

From the argument in the previous paragraph, this shows that h must be continuous 
which completes the proof of theorem 2.2.4. 



A. 3 Proof of Lemma 2.3.1 

Lemma 2.3.1: Suppose that f p £ Diff 1 (M) for p £ I p C IR, and let f(x,p) = f p (x) for 
any x £ M. Suppose also that f is C 1 and that f Po is an absolutely structurally stable 
diff eomorphism for some p £ I p . Then there exists e > and K > such that for every 
positive e < eo, any orbit of f Po can be e— shadowed by an orbit of f p for p £ B(p 0} Ke). 

Proof: This follows from the definition of absolute structural stability. From that def- 
inition, we know that there exists e > 0, K\ > 0, and conjugating homeomorphisms, 
h p : M — » Af, such that if p £ B(p 0} e ), then: 

sup d(h~ 1 (x),x) < K x supd(f P0 (x),f p (x))). 
xeM xeM 

where f Po = h p f p h~ x . Given an orbit, {x n }, of f Po we claim that h~ x maps x n onto a 
suitable shadowing orbit, z n (p) of f p for each n £ Z. Also, since / is C 1 for (x,p) £ Mxl p} 
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there exists a constant, K 2 > 0, such that sup xe Md(f Po (x) } fp( x )) ^ K2\p — Po\ for any 
p G I p - Thus, setting z n (p) = h~ 1 (x n ) } for all n we see that: 

sup d(z n (p), x n ) < sup d^hp 1 (x), x) 

n& xEM 

< K 1 snpd(f P0 (x),fp(x)) 

xEM 

< I<iK 2 \p-po\ 

for all integer n. Now setting K = 2I< 1 K , we have the desired result that sup ne zd(z n (p), x n ) < 
e if p G B(p 0} Ke) } for all n and any positive e < e . This completes the proof of 
lemma 2.3.1. 
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Appendix B 



Proof of theorem 3.2.1 



In this appendix, we present the proof for theorem 3.2.1. 



B.l Preliminaries 

We hrst repeat the related definitions which are the same as those found in chapter 3. 
Throughout this appendix we shall assume that JcR represents a compact interval of 
the real line. 

Definitions: Suppose that / : 7 — >■ 7 is continuous. Then the turning points of / are 
the local extrema of / in the interior I. C(f) is used to designate the set of all turning 
points of / on 7. (7 (7, 7) is the set of continuous maps on 7 such that /£C (7, 7) if: 

(a) / is C r (for r > 0) 

(b) /(/) C 7, and 

(c) f(Bd(I)) C Bd(I) (where Bd(I) denotes the boundary of 7). 
If/ G (7(7,7) and g G (7(7,7), let d(f,g) = su P:ce/ \f(x) - g(x)\. 

Definitions: A continuous map / : 7 — > I is said to be piecewise monotone if / have 
finitely many turning points. / is said to be a uniformly piecewise-linear mappings if it 
can be written in the form: 

f(x) = a, ± sx for x % G [c 8 _i, c 8 ] (B.l) 

where s > 1, c < c\ < . . . < c q and q > is an integer. (We assume s > 1 because 
otherwise there will not be any interesting behavior). 

Note that for this section, it is useful to define neighborhoods, B(x, e), so that they 
do not extend beyond the confines of I. In other words, let B(x, e) = (x — e, x + e) D I. 
With this in mind, we use the following definitions to describe some relevant properties 
of piecewise monotone maps. 
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Definition: A piecewise monotone map, /:/—>■/, is said to be transitive if for any 
two open sets U, V C /, there exists an n > such that f n (U) D V ^ 0. 

Definitions: Let /:/—>■/ be piecewise monotone. Then / satisfies the linking 
property if for every c £ C(/) and any e > there is a point z £ I such that z £ B(c } e), 
f n (z) £ C(/) for some integer n > 0, and |/ 8 (c) — /'(z)| < e for every z £ {1, 2, . . . , n}. 
Suppose, in addition, that we can always pick z ^ c such that the above condition is 
satisfied. Then / is said to satisfy the strong-linking condition. 

We are now ready to state the objective of this appendix: 

Theorem 3.2.1 Transitive piecewise monotone maps satisfy the function shadowing 
property in C°(/,J) if and only if the satisfy the strong linking property. 

We note Liang Chen [12] proves a similar result, namely that the pseudo-orbit shad- 
owing property is equivalent to the linking property for maps topologically conjugate 
to uniformly piecewise linear mappings. Some parts of the proof we describe below 
are also similar to the work of Coven, Kan, and Yorke [17] for tent maps (uniformly 
piecewise linear maps with one turning point). The main difference is that they prove 
a pseudo-orbit shadowing property while we are interested in parameter and function 
shadowing. 



B.2 Proof 

This section will be devoted to the proof of theorem 3.2.1 and related results. The basic 
strategy of the proof will be as follows. First we relate piecewise monotone mappings to 
piecewise linear mappings through a topological conjugacy (lemmas B.2.1 and B.2. 2). 
This provides for uniform hyperbolicity away from the turning points. Second we capture 
the effects of "folding" near turning points and show how this leads to function shadowing 
(lemmas B.2. 4, B.2. 5, B.2. 6). Finally in lemma B.2. 7 we show that the local folding effects 
of lemmas B.2. 4, B.2. 5, or B.2. 6 are satisfied for the maps we are interested in. 

Lemma B.2.1 ; Let f : I — >■ I be a transitive piecewise-monotone mapping. Then f is 
topologically conjugate to uniformly piecewise-linear mapping. 

Proof: See Parry [51] and Coven and Mulvey [18]. 

The following lemma is necessary for the application of the topological conjugacy 
result. 

Lemma B.2. 2 Let /:/—>■/ and g : I — >■ I be two topologically conjugate continuous 
maps. If f has the linking or strong linking property then g must have these properties 
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also. If f satisfies has the function shadowing property on C°(7,7), then g must also 
satisfy the function shadowing property on C°(7,7). 

Proof: Since / and g are conjugate, the orbits of / and g are connected through a 
homeomorphism, h, such that g = h~ 1 fh. Because h is continuous and one-to-one, the 
of turning points of / and g must be preserved by the topological conjugacy. Thus if / 
has the linking or strong linking properties, then g must have these properties also. 

Now suppose that / has the function shadowing property on C°(7,7). We want to 
show that g also has this function shadowing property which means that for any e > 0, 
there exists a neighborhood, V, of g in C°(7, 7) such that if g* £ V then any orbit of g 
is e— shadowed by an orbit of g*. 

Since h is continuous, and 7 is compact, we know that given e > there exists 8 > 
such that \x — y\ < 8 implies \h(x) — h(y)\ < e if x, y £ 7. Given this 8 > 0, since / has 
the function shadowing property, there is a neighborhood U C C°(7,7) of / such that 
if /* £ U, then any orbit of / can be <5-shadowed by an orbit of /*. Let V = h~ 1 Uh. 
Since g = h~ 1 fh } V must contain a neighborhood of g in C°(7,7). We now must show 
if g* £ V } then any orbit of g can be e— shadowed by an orbit of g*. 

Suppose we are given an orbit, {x n }, of g and any g* £ V. Let {w n } be the corre- 
sponding orbit of / such that w n = h~ 1 (x n ). Set /* = h~ x {g^). Since /* £ U, there exists 
an orbit, {?/„}, of /* that 8— shadows {w n }. Then if z n = h(y n ) } {z n } must be an orbit 
of g* that e— shadows {x n }, since \h(x) — h(y)\ < e if \x — y\ < 8. This proves the lemma. 

Thus, combining lemmas B.2.1 and B.2.2, we see that the problem of proving the 
function shadowing property for transitive piecewise-monotone maps with the strong 
linking property reduces to proving the function shadowing property for uniformly piece- 
wise linear maps with the strong-linking property. 

We now introduce one more result that will be useful later on: 

Lemma B.2.3 Let /:/—>■/. Suppose f n satisfies the function shadowing property on 
C°(7, 7) for some integer n > 0. Then f has the function shadowing property on C°(7, 7). 

Proof: Given any e > we need to show that there exists a neighborhood, U of / in 
C°(7, 7) such that if g £ U, then any orbit of / is e— shadowed by an orbit of g. Since / 
is continuous and 7 is compact, there exists a 8 > such that if \x — y\ < 8 } then 

!/*'(*) -/*'(y)l<^ (B.2) 

for any i £ {0, 1, . . . , n} and x, y £ 7. We also know that there exists a neighborhood, 
Vi of / in C°(7, 7) such that if g £ V x : 

|/«-(x)-^-(x)|<I £ (B.3) 
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for all x G 7 and i G {0, 1, . . . , n}. 

Combining (B.2) and (B.3) and using the triangle inequality we see that for any 
e > there exists a 8 > and a neighborhood, V\ } of / in C°(7,7) such that if g G V\ 
and \x — y\ < 8 } then: 

\r( x )-g*(y)\ < e. (B.4) 

for all i G {0, 1, . . . , n} if x,y G 7. Given e > 0, fix 6 > and V\ G C°(7, 7) to satisfy 
(B.4). 

Using this 8 > 0, since / n has the function shadowing property, we know there exists 
a neighborhood, V 2} of f n in C°(7,7) such that if g n G V 2} then any orbit of f n is 
<5— shadowed by an orbit g n . Given this neighborhood, V 2} of f n } we can always pick a 
neighborhood, V3 C C°(7, 7) of / such that g G V3 implies that g n £ V2. This is apparent, 
since for any a > there exists a neighborhood V3 of / in C°(7, 7) such that 

d(f n ,g n ) = S uj>\f n (x)-g n (x)\<a. 
xei 

if <? G U. Thus, for any e > 0, if g G V3, then any orbit of / n is <5-shadowed by an orbit 
of g n . 

Now set U = V\ fl V3. Note that £/ must be a contain neighborhood of / in C°(7, 7). 
If we fix g G ?7, we hnd that given any orbit, {xi}fl 0} of /, there is an orbit, {j/ 8 '}°^ , of g 
such that yi G B(x{ } 8) if i = kn for any & G {0, 1, . . . }. Thus, from (B.4), we know that 
Hi G B(xi } e) for all i > 0. Consequently, given any e > 0, there exists a neighborhood U 
of / in C°(7, 7) such that if g G U, then any orbit of / can be e— shadowed by an orbit 
of g. This is what we set out to prove. 

We now examine the mechanism underlying shadowing in one-dimensional maps. In 
the next three lemmas we look at how local "folding" can lead to shadowing. 

Lemma B.2. 4 Given f G C°(7,7), suppose that for any e > sufficiently small there 
exists a neighborhood, U } of f in C°(7,7) such that if g G U, 

g(B(x,e))D(B(f(x),e)) (B.5) 

for all x G 7. Then f has the function shadowing property in C°(7,7). 

Proof Let {x n } be an orbit of / and suppose that (B.5) is satisfied. Then if g G U, for 
any j/i G 7 with y\ G B(xi, e) we can choose a y G 7 so that y G B(x 0} e) and y\ = g(yo). 
Similarly for any y 2 G 7 with y 2 G B(x 2} e), we can pick j/i and j/ within e distance of xi 
and x 0} respectively. Extending this argument for arbitrarily many iterates we see that 
(B.5) implies that there exists an orbit, {yi}, of g so that j/ 8 - G 7?(x 8 , e) for all integer 
i > 0. Thus, given any e > sufficiently small, there exists a neighborhood, U, of / in 
C°(7, 7) such that if g G U, then any orbit orbit of / can be e— shadowed by an orbit of 

9- 
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Lemma B.2.5 Let f £ C°(7, 7). Suppose that for any e > sufficiently small, there 
exists N > and a neighborhood, U } of f in C°(7,7) such that for any g £ U, there 
exists a function n : 7 — >■ Z + so £/ia£ /or eac/j x £ 7 : 

fo n (*)(y) : |/*"(a) - g\y)\ < e, < » < n(x)} D (5[f W (i), e]) (B.6) 

where 1 < n(x) < Af /or a// x £ 7. TTiera / /jas £/ie function shadowing property in 

C°(7,7). 

Proof: The idea is very similar to lemma B. 2. 4. Let {x n } be an orbit of /. In lemma B.2.4, 
given sufficiently small e > and g £ U, we could always choose y £ B(x 0} e) given a 
j/i £ B(xi,e) so that j/i = g(yo). A similar thing applies here except that we have to 
consider the iterates in groups. Suppose that the premise of lemma B.2.5 is satisfied. 
Given sufficiently small e > 0, fix g £ U. Then, for any y n (x ) £ B(x n ( Xo ^ e), there exists a 
finite orbit Y = {j/ 8 '}™Aq° of g such that \xi — j/ 8 | < e, for i £ {0, 1 , . . . , n(x )}. Similarly, 
we can play the same trick starting with y n (x ) f°r the next n(x n ( Xo )) group of iterates 

constructing another finite orbit, Y\ = {yi} i=n i x \ , of g. Since we are free choose 

Yq from any y n (x ) £ B(x n ( Xo ^ e), it is clear that given any Y\ we can pick a Y belonging 
to the same infinite forward orbit of g, thereby allowing us to concatenate Y and Y\ to 

j j i r ■, i -j t f -\ n ( x °)+ n ( x n(x )) ,i , 11 ( -^n(x )+n(x n ( )) 

construct a single finite orbit ot g, {j/ 8 j 8=0 that e— snadows \Xi\ i=0 . 

This process can be repeated indefinitely for arbitrarily many groups of iterates, gluing 
together each group of iterates as we go. Thus the function shadowing property holds. 

Lemma B.2.6 Let f £ C°(7,7). Suppose that for any e > sufficiently small, there 
exists N > and a neighborhood, U } of f in C°(7,7) such that for any g £ U, there 
exists a function n : 7 — >■ Z + so that for each x £ 7 : 

{g n(x)+1 (y) : \x - y\ < e, \f(x) - g\y)\ < 8e, f < i < n(x)} (B.7) 

Dg[B(r {x) (x),e)] 

where 1 < n(x) < N for all x £ 7. Then f has the function shadowing property in 

C°(7,7). 

Proof: (compare with lemma 2.4 of [17]). We shall show that given sufficiently small 
e > and any g £ U, if (B.7) is satisfied, then for any orbit, {xi]^l Q of /, there exists 
an orbit, {j/ 8 '}°^ , of g such that \xi — j/ 8 | < 8e for all integer i > 0. By condition (B.7), 
given any y° n , x ^ +1 £ g(B[x n ( Xo ^ e]) we can choose a finite orbit, Yq = {yf }7=o° ■> °^ 9 that 
8e-shadows {x 8 '}™1q and satishes g{y n < x \) = yn( x )+i- Similarly, using the same trick with 
the next n(x n ( Xo )) iterates, we can construct a finite orbit, Y\ = {yj} i=n ( x \ , of g 

that 8e-shadows {xi] i= °^ Xo ^ " (xo) and satishes y\^ £ B(x n ( Xo ^e). 
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Also, notice that given Y\ we can always choose a F so that g(y®( x \) = y\i x )+r 
This is because we know that y 1 , \ G B[x n ( Xo ), e] and because we are free to choose any 
Vntx )+i ^ 9(B[%n(x )7 e D to construct Yq. Consequently we can concatenate Y and Y\ to 

form an orbit that 8e-shadows {xi} i=0 . We can continue this construction by 

concatenating more groups of n(xi) iterates for increasingly large i. Thus given (B.7) it 
is apparent that we can choose an orbit, {j/ 8 }°^ , of g that 8e-shadows any orbit of / if 
g G U. This proves the lemma. 

Now we must show that lemma B.2.6 is satisfied for any uniformly piecewise-linear 
map. Note that condition (B.6) in lemma B.2.5 in fact implies (B.7) in lemma B.2.6, so 
it is sufficient to show that either (B.6) or (B.7) is true for any particular x G 7. This 
is done in lemma B.2.7 below. We can then combine lemma B.2.7 with lemma B.2.3 to 
prove theorem 3.2. f. 

First, however, we introduce the following notation, in order to state our results more 
concisely. 

Definition: Given a map, / G C° (1,1), define: 

D k (x,g,e) = {g k (y):y(El,\r(x)-g t (y)\<6foii(E{0,l,...,k}}. 

E k (x,g,e) = {g k (y) : y G I, \x - y\ < e, and \f(x) - g l (y)\ < 8e for i G {1 , 2, . . . , k}}. 

for any x G /, k G Z + , and e > where g G C°(J, 7) is a C° perturbation of /. Although 
Dk(x,g,e) and Ek(x } g } e) also depend on / we leave out this dependence because / 
will always refer to the uniformly piecewise linear map specified in the statement of 
lemma B.2.7 below. 

Lemma B.2.7 ; Let f : I — » 7 be a uniformly piecewise linear map with slope s > 9. 
Suppose that f satisfies the strong linking property. Then for any e > there exists 
A > and a neighborhood, U, of f in C°(7,7) such that for any g G U at least one of 
the following two properties hold for each x G 7 : 

(I) D n{x) (x,g,e)DB[f n W(x),e] 

(II) g(E n{x) (x,g,e)) D g(B[f n W(x),e\) 

where n : 7 — >■ Z + and 1 < n(x) < A for all x G 7. 

Proof of lemma B.2.7: Let C(f) = {ci, c 2 , . . . , c q } where c\ < c 2 < . . . < c q . Assume 
that e > is small enough such that 

\c k - Ci\ > 16e 

for any k ^ i. 
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We now utilize the strong linking property. For each j £ {1,2, . . . } q} and k £ Z + 
define Wk(j, e) C 7 such that: 

™*0', <0 = {/(</) : y £ /, |/*'( Cj ) - f(y)\ < \e for z £ {0, 1, . . . , £}} (B.8) 

Given e > 0, for each j £ {1, 2, . . . , q] let raj be the minimum k such that 

u*(j,e)n C(/)^0. (B.9) 

The strong linking property implies that such raj's exist and are finite for each j £ 
{1, 2, . . . , q] and for any e > 0. From (B.8) and (B.9) we can also see that for each 
j £ {1, 2, . . . , q} } there exists some r(j) £ {1, 2, . . . , q] such that 

Cr(j) G w k (j,e). 

Now set: 

<^ = — min |f mj (c 7 )-c rm | (B.10) 

f0 j€{i,2,...,,} IJ y 3j rW V ; 

and note that from (B.8) and (B.9): 

\f mi {cj)-c r(]) \<\e (B.ll) 

for any j £ {1, 2, . . . , q). Thus it is evident that: 

8 X < jt. (B.12) 

Because of the strong linking property, we know that 8 X > 0. 

Also, set M = max je {i : 2 : ... : q}mj, define A x (g) : C°(7, 7) — > IR such that: 

Ax(g) = r max snp\f(x)-g t (x)\, (B.13) 

8 £{1,2,...,M} xeI 

and choose £/ to be a neighborhood of / in C°(7, 7) such that A x (g) < 8 X for any g £ U. 
Thus for any g £ £/, any x £ 7, and any z £ {1,2,..., Af } : 

|/*"(a)-<7*"(a)|<±e. (B.14) 

Now, let (a; 6] indicate either the interval, (a, &], or the interval, [6, a), whichever is 
appropriate. Then, since s > 9, for any e > we assert that: 

Di( Cj , /, e) = Cf( Ci ) - <r,-( Ci )e ; f (c,)] (B.15) 
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for each j £ {1, 2, . . . , q] and every i £ {1,2,..., rrij} where: 



o%: c 



+ 1 if j l has a a relative maximum at c £ C(f) 
— 1 if j l has a a relative minimum at c £ C(f). 



Note that (B.9) guarantees that that Di(cj, /, e)flC(/) = for any z £ {1, 2, . . . , raj — 1}. 
Thus, since s > 9, (B.15) can be shown by a simple induction on i. 

We now proceed to the main part of the proof for lemma B.2.7: 

Given any g £ U we must show that for each x £ I either condition (I) or (II) holds 
in the statement of the lemma for some n(x) < N. We now break up the problem into 
two separate cases. Given some e > first suppose that x is more than e distance away 
from any turning point. In other words suppose that \x — Cj | > e for all j £ {1, 2, . . . , q}. 
Then we can set n(x) = 1 and it is easy to verify that condition (I) of the lemma holds: 

Di(x,g,e) = g(B(x,e))f]B(f(x),e) 
= B(g(x),e) 

since s > 9 and \f(x) — g(x)\ < | for all x £ /. 

The other possibility is that x is within e distance of one of the turning points, in 
other words that x £ V where: 

V = {x £ I : \x — Cj\ < e for j £ {1,2,..., q}}. 

Below we show that for all g £ U } if x £ V does not satisfy condition (I) then x satisfies 
condition (II) of the lemma. This would complete the proof of lemma B.2.7. 

Suppose that \x — Cj\ < e for some j £ {1,2,. . . } q} and suppose that x does not satisfy 
condition (I) for any n(x) £ {1,2,..., ra.,}. In qualitative terms, since / is expansive by 
a factor of s > 9 everywhere except at the turning points, the only way for x not to 
satisfy condition (I) is if x is close enough to Cj so that Di(x } g } e) represents a "folded" 
line segment for every i £ {1,2,..., ra.,}. 

More precisely, for each i £ {1,2,..., rrij if we let 

Ji(x,g,e) = {yel: \f k {x) - g k {y)\ < e for k £ {0, 1, . . . , i}}. 

so that Di(x } g } e) = g t (Ji(x } g } e)), then following claim is true. 

Claim: Given g £ U, suppose that x £ B(cj,e) does not satisfy condition (I) of 
lemma B.2.7 for any n(x) £ {1,2,..., ra.,}. Then for each j £ {1, 2, . . . , q] we claim 
that the following three statements are true: 

(1) For any i £ {1, 2, . . . , rrij}, if we define yi(j) £ Ji(x } g } e) such that: 



126 



then 

Di(x,g,e) = (f(x) - a t ( Cj )e ; fl%(j))] (B.17) 

a ndg t {y t {j))e{f{x)-e,f{x) + e). 

(2) For any i £ {1, 2, . . . , m 3 - 1}, Di(x, f, e) n C(/) = 0. 

(3) For any i £ {1, 2, . . . , m ; }, j/,-(j) £ J t (x, /, e). 

Proof of claim: We prove parts (1) and (2) of this claim by induction on i. 

First we demonstrate that if conditions (1) and (2) above are true for each i £ 
{1, 2, . . . , k} where k £ {1,2,..., rrij — 1}, then condition (1) is true for i = k + 1. Thus 
we assume that D k (x } g } e) has the form given in (B.17), if x £ B(x, e), so that: 

D k (x,g,e) D (f k (x) - <r k (cj)e ; g k (x)]. 

Since |/ (x) — g (x)\ < |e, this means: 

D k (x,g,e) D (f k (x) - a k ( Cj )e ; f k (x) - -a k ( Cj )e\. 

In particular (f k (x) — ^a k (cj))e £ D k (x, g, e). Since D k (x, /, e) D (f k (x) — a k (cj)e ; /'(a:)] 
and D k (x } f } e) fl C(/) = (assuming that (2) is true for i = k) we know that [C(/) fl 
(/ fc (x) - |CT fc ( Cj )e ; /''(a;))] = 0. Thus, since s > 9 : 

g{f{x) ~ ^kic^e) £ (/ fc (x) - -s<Tfc + i(cj)e - 8 X ; / fc (x) - -sa k+1 {c 3 )e + 4) 

Now suppose that Cj is a relative maximum of the map f k+1 so that a k+ i(cj) = +1 (the 
case where a k+ i(cj) = —1 is analogous). Then we find that: 

g(f k (x) - ^k(cj)e) < f k (x) - e 

where g(f k (x) — ^a k (cj)e) £ g(D k (x, g, e)). Thus, since D k (x, g, e) and hence g(D k (x, g, e)) 
are connected sets, this means that since 

D k+1 (x,g,e) = g{D k {x,g,e))nB{f k+1 {x),e) 

we know that f k (x) — e must be the lower endpoint of D k+ i(x,g, e). Also we know that 

D k+1 (x, g, e) C (f k+1 (x) - e ; f k+1 (x) + e) 

because otherwise condition (I) is satished for n(x) = k + 1. Consequently by the defi- 
nition of y k (j) in (B.16), we see that: 

D k+1 (x,g,e) = (f k+1 (x) - ( Cj )e ; g k (y k+1 (j))]. 
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where g k (yk+i(j)) G (f k+1 (x) — e ; f k+1 (x)-\-e) if cr A . +1 (c_ 7 -) = +1. Combing this with the 
corresponding result for (Tk+i{cj) = — 1 proves that condition (1) is true for i = k + 1 
given that (1) and (2) are true for i = k. 

Next we show that if (1) and (2) are true for each i G {1,2, ...,&} where k G 
{1,2,..., rrij — 2}, then (2) is true for i = k-\-l. Suppose on the contrary that (2) is not 
true for k = % + 1 so that D k+1 (x, f, e) n C(f) ^ 0. Since D k+1 (x, f, e) C B(f k+1 (x), e) 
we know that: 

f k+1 (x)(EB(c } e) (B.18) 

for some c G C(f). From (B.8) and (B.9) we also know that: 



r( Cj )^(c- c+-a t (c ] )e) (B.19) 



for any c G C(f) if i G {1, 2, . . . , m 3 - 2}. 

We now address two cases. First suppose that there exists some t G {1, 2, . . . , k} and 
c G C(f) such that: 

ce(f(x) ; /*(<:,.)) (B.20) 

Let t be the minimum value for which (B.20) holds for any c G C(f). Since t is minimal 
we know that /* must be monotone on (x; Cj) so that: 

^i)(/*(ci)-/*(x))>0. 
Combining this result with (B.20) and (B.19) we hnd that: 

^(c i )(f(c j )-f(x))>^e. (B.21) 

Now suppose there exists no i G {1, 2, . . . , &}, such that: 

c G (f(x) ; f( Cj )) 

for any c G C(f). Note that since we assume (2) is true for i < k } this means there exists 
no i G {1, 2, . . . , &}, such that: 

ce(f(x) ; /*'( Cj ))UA(a:,/,e). 

for any c G C(f). Then for any i G {1, 2, . . . , k + 1}, we know that j l is monotone on 
(x\ Cj) U Ji(x, /, e). Thus, for any z G Di(x } /, e) we have: 

^(9)Cf(9)-^)>o 
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and from (B.18) and (B.19): 

°k + i(c j )(f k+1 (c j )-f k+1 (x))>le. (B.22) 

From (B.21) and (B.22) we have shown that if (2) is satisfied for any i £ {1 , 2, . . . , k} 
then there exists t < k + f such that: 

<7 t (c J )(f(c J )-f t (x))> 3 -e. 

This implies that: 

^(9X^(9) " f( x )) > e 

so Cj £" J t (x } g } e). Thus there exists some £ £ {0, f, . . . ,t — 1} such that Cj £ Ji(x } g } e) 
for any i satisfying f < i < £ but Cj £" Ji + i(x } g } e). Since Di(x } g } e) fl C(/) = for any 
z £ {1, 2, . . . ,£} we know that: 

vz +1 (c 3 )(f +1 (c 3 )-f+\x))>U. 

Consequently, since Cj £" Ji + \(x } </,e), it is apparent that: 

* t+1 (c j )tf +1 (c j )-f e+1 (x))>e. 

Thus, since Dt(x,g,e) is connected, and since g i+1 (cj) £ g(Df(x,g,e), we know that 
f i+1 (x) + <jf + i(cj)e must be an endpoint of Df + i(x,g,e) = g(Df(x,g,e) C\ B(f^(x),e) 
where £-\-l<t<k-\-l. This contradicts (f) for i = £ -\- 1 < k -\- 1. But we have 
already shown that if (f ) and (2) are satisfied for i £ {1,2,..., A;}, then (1) is satisfied 
for i = k + 1. Thus if (1) and (2) are satisfied for i £ {1,2, ...,&}, then (2) is also 
satisfied for i = k + 1. 

We now need to show that (1) is true for i = 1. By definition, we can write: 
Di(x } g } e) = g[(x — e } x-\-e)]C]B(f(x) } e). If condition (I) is not satisfied, then Di(x } g } e) C 
(f(x) — e, f(x) + e) and at least one endpoint of Di(x } g } e) has to correspond either to a 
maximum or minimum point of g in the interior of Ji(x } g } e). Since s > 9, and since all 
the turning points of / are separated by at least 16e, we know that the other endpoint 
of Di(x } g } e) must be f(x) — o"i(cj)e. Thus Di(x } g } e) has the form given in (B.17). 

Now we show that (2) is true for i = 1. Suppose that Di(x } g } e) D C(f) ^ 0. Then 
(Ti(cj)(f(x) — c) < e for some c £ C(f). If x £ B(cj, e) and rrij > 1 then <Ti(cj)(/(cj) — c) > 
|e for any c £ C(f). Thus <ti(cj)(/(cj) — f(x)) > |e which means that ai(cj)(g(cj) — 
f( x )) ^ e - This contradicts (1) for i = 1 and completes the proof of parts (1) and (2) of 
the claim. 

We now show that condition (3) of the claim holds. Suppose on the contrary that 
there exists x £ B(cj,e) for some j £ {1, 2, . . . , q] such that yi(j) ^ J 8 (x,/, e) for 
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some i £ {1, 2, . . . , rrij}. Then there exists a A; £ {0, 1, . . . , i — 1} such that y k+ i(j) ^ 
J k+ i(x } /, e) but j/^(j) £ Jjt(x, /, e) for any integer £ satisfying 1 < £ < k. We know that: 

g k+1 (yi(j))e(f k+1 (x)-e, f k+1 (x) + e). 

And, since \f k+1 (y k+1 (j)) - g k+1 (y k+1 (j))\ < 6 X} we find that: 

f k+1 (ViU)) e (/* +1 (x)-e-^ , / fc+1 (x)- e ) 

U (f k+1 (x) + e } f^( x ) + € + S) (B.23) 

<7* +1 (y.-(J)) e (f k+1 (x)-e, f k+1 (x)-e + 6 x ) 

U (/* +1 (x) + e-^ , / fc+1 (x) + e ). (B.24) 

Also, substituting / = g in part (1) of the claim, we can see that: 

Di(x, f, e) = (f(x) - a % ( Cj )e ; f( Cj )] (B.25) 

where f l (cj) £ (f l (x) — e , f l (x) + e) for any i £ {1 , 2, . . . , rrij} provided condition (I) 
of the lemma is not satisfied. Now suppose (T k+ i(cj) = +1 (the other case is analogous). 
Then, since yi(j) £ J k (x, f, e), we know that it cannot be true that f k+1 (yi(j)) > 
f k+1 (x) + e, since that would contradict (B.25). Thus we can drop one of the intervals 
in each the unions in (B.23) and (B.24). In particular we find that: 

9 k+1 (y l (j)) e (f k+1 (x) - a k+1 ( Cj ) ; f k+1 (x) - a k+1 ( Cj )(e - 6 X )). (B.26) 

This implies i ^ k + I since: 

if (T k+ i(cj) = +1: 

if a k+1 {cj) = -I: „ v „ 

zeJk + l(x,9,£) 

But since _Dfc +1 (:r, /, e) D C(f) = for k + I < raj we know from (B.25) that: 

(f k+1 (x) + a k+1 (c 1 )e) ; / fc+1 (x)) f| W) = 0. 

Thus from (B.26), since s > 9, it is clear that 

9 k+2 (yi(j)) ^D k+2 (x } g } e). 

This means that j/ 8 (j) ^ Ji(x } g } e) for any £ > & + 2, so z < k + 1. But we have 
already shown that i ^ k-\-l. Therefore i < k. But this contradicts our assumption that 
& £ {0, I, . . . , z — I}. This proves condition (3) and completes the proof of the claim. 

Returning to the proof of lemma B.2.7 we now assert that: 

E mj (x,g,e) D (H(x)-8<T mj ( Cj )e , g^(y mj (j))]. (B.27) 
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</* +1 (y*+iO')) = 


sup g k+1 (z)>f k+1 (x)>g k+1 ( yi (j)) 

z£Jk + l(x,9,c) 


g k+1 (y k+ i(j)) = 


= inf a k+1 (z)<f k+1 (x)<g k+1 ( yi (j)). 



if x does not satisfy condition (I) of the lemma for any n(x) £ {1, 2, . . . , rrij}. It is 
clear that Di(x } g } e) C Ei(x } g } e) for each each i £ {1, 2, . . . , rrij}. We also know that 
\f(x) — g(x)\ < |e for all x £ I so that given the form of Di(x,g, e) in (B.17) and because 
of the expansion factor, s > 9, we have that: 

E l+1 (x.g,e)Dg(D l (x,g,e))nB(r +1 (x),8e). 

for any i £ {1, 2, . . . , raj — 1}. Setting i = m rj — 1, and substituting Di(x } g } e) in the 
equation above using (B.17), we get (B.27). 

Now suppose that o m \cj) = +1 (the case where o m \cj) = —1 is analogous). Then, 
from (B.10): 

n(cj)-c r{3) >m x . (B.28) 

Also, if condition (I) is not satisfied for some x £ B(cj, e), then since y m (j) £ D m (x, /, e) 
we know that f m3 {cj) > f mj (y m \j)) since D m _i(x, /, e) fl C(f) = 0. Thus, because 
\f m >{x)-g m i{x)\ <6 X : 

g mi (y mj (j)) - H (ci) < (n (y mj (j)) + s x ) - n (c 3 ) 

< (f»i ( Cj ) + s x ) - r< m 

< 8 X (B.29) 

^(ym^-ntc,) > g^( Cj )-r i (c j )>-S x . (B.30) 

Note that / has either a local maximum or a local minimum at c r ^y For dehniteness, 
assume that / has a local maximum at c r ^ (the other case is again analogous). Then, 
since \f(x) — g(x)\ < 8 X for all x £ /, there exists a local maximum of the map, g, at 
yi(r(j)) such that: 

9(yi(r(j))) = sup g(x) (B.31) 

x(E:B(c r (j\,8e) 
c 

and yi (r(j)) £ B(c r(lh 2-^). (B.32) 

since the turning points of / are separated by at least 16e distance. 

Consequently from (B.28), (B.30), (B.32), and since s > 9 we see that: 

g mi (y mj U))-yi(r(j)) 

= [cru) + (f mj (cj) - c rU) ) + (g m i(y mj (j)) - n(c 3 ))] - [c rU) + ( yi (r(j)) - C r{j) )] 

c 

> [fv(j) + 104 - 8 X ] - [fv(j) + 2—)] 

> 0. (B.33) 
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Also, from (B.29), (B.ll), and (B.32) and since s > 9 and 8 < \e: 

g mi (y mj U))-yi(r(j)) 

= GT'CMi)) - r'te)) + (n(ci) - c rU) ) - (c rU) ) - yi (r(m 

<Sx + -e-2 S - 
2 s 

< 3e (B.34) 

Consequently, from (B.33), (B.34), and (B.27) we see that if x £ B(cj, e) does not satisfy 
condition (I), then 

yMJ))eE mj (x,g,e). (B.35) 

Furthermore, from (B.31) we also know that: 

9(yi(r(j))) = sup g(z). (B.36) 

zeE m -(x,g,t) 

If we assume o m (cj) = +1, then from (B.27), (B.29), (B.ll), (B.32), and since s > 9 
and 8 X < |e we have: 

9 mi {x) < g m '(y mj {j)) 

< n(c 3 ) + 8 x 

5 

2 



< Cr-(j) + -e + 8 X 



< y 1 (r(j)) + 2^ + \e + 8 x 

s 2 

< j/i(r(j)) + 3e (B.37) 

Still assuming a mj (cj) = +1, then from (B.27), (B.36), (B.37), and since 8 X < |e, and 
\f(x) — g(x)\ < 8 X for all x £ I : 

g(E mj (x,g,e)) D (g(g m >(x) - 8e) , g(yMJ))] 

2 (g(yi(r(j)) - 5e) , flf(j/i(r(j))] 

3 (^(yiC^O"))) -5se + 8 x , flf(j/i(r(j))] 

3 (9(yMJ))) - \se , flfMr(j))] (B.38) 

Finally, if a mj (cj) = +1, then since c r ^ < f mj (cj) < c r ^ + |e and s > 9, we know from 
(B.32) that c r (j) — |e < yi(r(j))) < c r ^ + 3e. Thus: 

g(B[f m *(x),e]) C (g( yi (r(J))) ~ ^ ~ &* , 9(vMJ)))] 

C (flf(j/i(r(j)))-^e , flf(j/i(r(j)))] (B.39) 
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Consequently, from (B.38) and (B.39), we have that if x £ V does not satisfy condition 
(I) of lemma B.2.7 for any n(x) £ {1,2,..., raj}, then: 

g(E mj (x,g,e))Dg(B[f m *(x),e]), 

satisfying condition II of the lemma. We already saw that condition I of the lemma is 
satisfied for n(x) = 1 if x £ I \ V. This proves lemma B.2.7. 

Proof of theorem 3.2.1: 

Strong linking condition — > Function shadowing: Note that (B.6) in lemma B.2.5 may 

be rewritten as: 

D n{x) (x,g,e)DB[f n W(x),e] 

and (B.7) in lemma B.2.6 may be rewritten as 

g(E n{x) (x,g,e))Dg(B[f n W(x),e]) 

so we can see these two statements are the same as conditions in lemma B.2.7. 

For any x £ 7, condition (I) of lemma B.2.7 implies that condition (II) must also 
be true, since clearly E n ^(x,g,e) D D n ^(x,g,e). Thus, combining lemmas B.2.7 
and B.2.6, we see that if / : 7 — > I is uniformly piecewise linear with s > 9 and 
the strong linking property, then / must satisfy the function shadowing property on 
C°(7,7). Furthermore, using lemma B.2.3, we can drop the requirement that s > 9. 
We can do this since s > 1 for any uniformly piecewise linear map /, so there always 
exists n > such that f n is uniformly piecewise linear and satisfies s > 9. Thus, from 
lemmas B.2.1 and B.2.2, we know that any transitive map /:/—>■/ with the strong 
linking property must also satisfy a the function shadowing property on C°(7,7). 

Function shadowing — > Strong linking condition: Suppose that / is a piecewise linear 
map that does not satisfy the strong linking condition. We shall hrst show that / does 
not satisfy the function shadowing property on C°(7,7). 

If / does not satisfy the strong linking condition, then there is a c £ C(f) and e > 
such that there exists no z £ {7?(c, e) \ c] and n £ Z + satisfying f n (z) £ C(f) and 
\f % {c) — f % {z)\ < e for every z £ {1,2,... ,n}. We will show that if e £ (0, |e ), then for 
any 8 > there exists a g £ C°(7, 7) that satisfies d(f,g) < 8 but has the property that 
no orbit of g e— shadows the orbit, {/ 8 (c)}^ , of /. 

Now given 8 > and e < |e , choose g to be any map that satisfies the following 
properties: 

(1) 5 £C°(7,7) 
{2)g{c) = f{c)-a 1 {c)8 
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(3) g(x) = f(x) for any x £ {7 \ 7?(c, e )}. 

(4) sup xeB(C)t) [a 1 (c)g(x)} = <7i(c)#(c) 

(5) <*(/,</)<* 

Set x 8 - = / 8 (c) and let j/ 8 - = g l (c) so that {j/ 8 } is an orbit of g. Suppose that k £ Z + such 
that (Ti(c)(xi — yi) < e for all i £ {0, 1, . . . , &}. We assert that 

<7,-(c)( a ;.--2,.-)>s , '- 1 £ (B.40) 

for any z £ {1, 2, . . . , k + 1}. It is not hard to show this assertion by induction. For any 
i £ {1, 2, . . . , k} we have that C(f) n (xf, yi) = and a, +1 (c)(f(y,) - g(yi)) > 0. Thus, 
since a i+1 (c)(f(xi) - f(yi)) = scr,(c)(x, - j/ 8 ), we have that 

<T i+ i(c)(f(xi) - g(yij) > (T i+1 (c)(f(xi) - f(yij) = sa % (c)(x % - yi) (B.41) 

so that if (B.40) is true for z, then it also must be true for z + 1, provided that i £ 
{1,2,..., A;}. 

But {j/i}j-=o does not e— shadow {xi}^ . We can see this from (B.40) and from our 
choice of k, since e < |e . Furthermore there is no orbit of g that more closely shadows 
{ x i}i=o than {j/i}j-=o • This is because for any u £ /, if i £ {1, 2, . . . , k} and u £ Ji(c, g, e), 
then (g l (u); x^) C\ C(f) = since e < |e . Also, using property (4) of our choice of g, 
we can show that ■sup ze j t ^ C:g ^[(Ti(c)g t (z)] = cr 8 (c)(/ 8 (c) for any i £ {1,2,..., A; + 1} by 
induction on i. 

Consequently, if / is a piecewise linear map that does not satisfy the strong linking 
condition, then it cannot satisfy the function-shadowing in C° (/,/). Since the function 
shadowing property is preserved by topological conjugacy (lemma B.2.2) this implies 
that a transitive piecewise monotone map cannot exhibit function shadowing in C°(J, 7) 
if it does not satisfy the strong linking condition. 

This concludes the proof of theorem 3.2.1. 
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Appendix C 



Proof of theorem 3.3.1 



This appendix contains the proof for theorem 3.3.1. I have made an effort to make the 
appendix as self-contained as possible, so that the reader should be able to find most of 
the relevant definitions and explanations in this appendix. Naturally, this means that 
the appendix repeats some material found elsewhere in this report. 



C.l Definitions and statement of theorem 

We hrst repeat the related definitions which are the same as those found in chapter 3. 
Throughout this appendix we shall assume that JcR represents a compact interval of 
the real line. 

Definitions: Suppose that / : 7 — >■ 7 is continuous. Then the turning points of / are 
the local extrema of / in the interior I. C(f) is used to designate the set of all turning 
points of / on 7. Let (7 (7, 7) be the set of continuous maps on 7 such that / £ 
C r (7, I) if the following three conditions hold: 

(a) / is C r (for r > 0) 

(b) /(/) C /. 

(c) f(Bd(I)) C Bd(I) (where Bd(I) denotes the boundary of 7), 
If / £ (7(7,7) and g £ (7(7,7), let d(f,g) = su P:ce/ \f(x) - g(x)\. 

We will primarily restrict ourselves to maps with the following properties: 

(CO) g : I — > 7, is piecewise monotone. 

(CI) g is C 2 on 7. 

(C2) Let C(g) be the finite set such that c £ C(g) if and only if g has a local extremum 
at c £ 7. Then g"(c) ^ if c £ C(g) and g'(x) ^ for all x £ 7 \ C(g). 
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Under the Collet-Eckmann conditions, there exist constants Ke > and Xe > 1 
such that for some c £ C(g): 

(CE1) \Dg n (g(c))\>K E X E 

(CE2) \Dg n (z)\>K E \ n E ifg n (z) = c. 

for any n > 0. 

We consider one-parameter families of mappings, f p : 7^ — >■ 7^, parameterized by 
p £ I P , where 7^ C IR and 7 P C IR are closed intervals of the real line. Let f(x,p) = f p (x) 
where / : I x X I p — >■ I x . We are primarily interested in one-parameter families of maps 
with the following characteristics: 

(DO) For each p £ I p , f p : I x ^ I x satisfies (CO) and (CI). We also require that C(f p ) 
remains invariant with respect to p for all p £ I p . 

(Dl) / : I x x I p — >■ I x is C 2 for all (x,p) £ 7^ x 7 P . 

Note that the following notation will be used to express derivatives of f(x, p) with respect 
to x and p. 

D x f(x,p) = ?£(x,p) (C.l) 

Ox 

D p f(x,p) = ^(x,p). (C.2) 

The Collet-Eckmann conditions specify that derivatives with respect to the state, 
x } grows exponentially. Similarly we will also be interested in families of maps where 
derivatives with respect to the parameter, p } also grow exponentially. In other words, 
we require that there exist constants K p > 0, X p > 1, and N > such that for some 
p £ I p , and c £ C(f Po ): 

(CP1) \D P r(c, Po )\>K p x; 

for all n > N. This may seem to be a rather strong constraint, but in practice it often 
follows whenever (CE1) holds. We can see this by expanding with the chain rule: 

D P r(c, Po ) = D x f(r- i (c,p ), Po )D P r- i (c, Po ) + ivcr- 1 ^),;*) (c.3) 

to obtain the formula for D p f n (x^p ) : 

n— 2 n—1 

D p r(x, Po ) = DJir-^c^po) + J2[D P f(f(c,p ),p ) n D x f(f(c, Po ), Po )]. 
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Thus, if \D x f n (f(c,po),po)\ grows exponentially, we expect \D p f n (x } p )\ to also grow 
exponentially unless the parameter dependence is degenerate in some way. 

Now for any c £ C(f Po ) define cr n (c } p) recursively as follows: 

<7n+i(c,p) = sgn{D x f(f n (c,p),p)}cr n (c,p) 

where 

, , J f if c is a relative maximum of f p 

1 -fifcisa relative minimum of f p 

Basically cr n (c } p) = 1 if /I 1 has a relative maximum at c and cr n (c } p) = — 1 if /I 1 has a 
relative minimum at c. We can use this notion to distinguish a particular direction in 
parameter space. 

Definition C.l.l Let {f p : I x — > I x \p £ I p } be a one-parameter family of mappings 
satisfying (DO) and (Dl). Suppose that there exists p £ I p such that f Po satisfies (CE1) 
and (CP1) for some c £ C(f Po ). Then we say the that turning point c of f Po favors higher 
parameters if there exists N 1 > such that 

sgn{D p f n (c,p )} = sgn{(T n (c,p)} (C.4) 

for all n > N' . Similarly, the turning point, c, of f Po favors lower parameters if 

sgn{D p f n (c,p )} = -sgn{(T n (c,p)} (C.5) 

for all n > N'. 

The hrst thing to notice about these two definitions is that they are exhaustive if 
(CPf) is satisfied. That is, if (CPf) is satisfied for some p £ I p and c £ C(/ Po ), then 
the turning point, c, of f Po either favors higher parameters or favors lower parameters. 
We can see this from (C.3). Since \D p f(x } p )\ is bounded for x £ I X} if \D p f n (x } p )\ 
grows large enough then its sign is dominated by the signs of D x f(f n ~ 1 (c } p ) } p ) and 
D p f n ~ 1 (c } p ) } so that either (C.4) or (C.5) must be satisfied. 

Finally, if p £ I p and c £ C(/ Po ), then for any e > 0, define n e (c, e,p ) to be the 
smallest integer n > 1 such that \f n (c } p ) — c*| < e for any c* £ C(f Po ). We say that 
n e (c, e,p ) = oo if no such n > 1 exists. 

We are now ready to state main result of this appendix. 

Theorem 3.3.1 Let {f p : I x — > I x \p £ I p } be a one-parameter family of mappings 
satisfying (DO) and (Dl). Suppose that (CP1) is satisfied for some p £ I p and c £ 
C(f Po ). Suppose further that f Po satisfies (CE1) at c, and that the turning point, c, favors 
higher parameters under f Po . Then there exists Sp > 0, A > f , K' > 0, and K > f, such 
that if p £ (p — 6p } p ) } then for any e > 0, the orbit {f Po (c)}^- is not e— shadowed by 
any orbit of f p if \p — p \ > K'e\~ ne( - c ' Re ' Po \ 

The analogous result also holds if f Po favors lower parameters. 
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C.2 Proof 

Lemma C.2.1 Let {f p : I x — > I x \p £ I p } be a one-parameter family of mappings sat- 
isfying (DO) and (Dl). Then given p £ I p} there exist constants K\ > 0, K 2 > 0, and 
7l 3 > such that the following properties are satisfied: 

(1) \D x f(x 1 ,p ) - D x f(x 2 ,p )\ < Ki\xi - x 2 \ for any x x £ I x and x 2 £ I x . 

(2) Let 6x > to be the maximal value such that \x — cj\ < 6x implies \D 2 x f(x,p )\ > 
for any c* £ C(f Po ). Then \Df(x } p )\ > K 2 \x — c\ if \x — c\ < 6x for some 

c e C(f po ). 

(3) Fix c £ C(f P0 ). Then, \D x f(x,p) -D x f(x,p )\ < K 3 \x - c\\pi - p 2 \ for any x £ I x 
and p £ I p . 

Proof of (1): (1) is true since f(x,p) is C 2 and I x X I p is compact. 

Proof of (2): From (C2) we know that it is possible to choose a 6x > as specihed. Let 
c £ C(f Po ) and x £ I x . By the mean value theorem: 

\D x f(x,Po)\ = \D 2 x f(y,p )\\x - c\ 

for some y £ [c; x\. Now set: 

I<2 = - mf \Dlf(y,po)\. 

Z yE[c— ^Sx,c+^Sx] 

From our choice of 6x, we know K 2 > 0. Thus if \x — c\ < ^6x, we have that: 

\Df(x,p )\ > 2K 2 \x-c\. 

But since \D 2 x f(y,p )\ > if \x — c\ < Sx, it is evident that \Df(y,p )\ > \Df(x-\- 1<5, p )\ 
for any y £ (c+|&r, c-\-8x). Similarly \Df(y,p )\ > \Df(x — 1<5, p )\ if y £ (c— &r, c— |<5x). 
Thus: 

\Df(x,p )\ > K 2 \x - c\ 

for any x satisfying \x — c\ < Sx. 

Proof of (3): Fix c £ C(f Po ) and p £ 7 p . Then for any x £ I x and p £ 7 p , let: 

q(x,p) = D x f(x,p) - D x f(x,p ). 

Since / is C 2 , q must be C 1 . It is clear that: 

q(c,p)=0 (C.6) 
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for all p G I p and 

q{x,p o ) = (C.7) 

for all x G I x . 

From (C.7) and since q(x } p) is C 1 , q(x } p) satisfies a Lipschitz condition on I x X 7 P 
so that there exists a constant C > such that: 

\q(x,p)\<C\p-p \. (C.8) 

for any (x,p) G I x X 7 p . Now define 

r(x,p)= p-po "/^ (C.9) 

[ V p q(x,p ) it p = p 

Note that from (C.8), |r(x,p)| < C\I P \ for any (x,p) G I x X 7 P such that p ^ p - Since r is 
bounded and q(x } p) is C 1 , it is fairly easy to check that r(x } p) is C 1 for all (x,p) G 7^ x7 p . 

From (C.9) and (C.7), we see that: 

q(x,p) = r(x,p)(p-p ) (C.fO) 

for all (x,p) G I x X 7 p . Also from (C.6) we know r(c } p) = for all p G 7 P . Thus since 
r(x,p) is C 1 , there exists 7l 3 > such that |r(x,p)| < Ks\x — c\ for any (x,p) G I x X 7 p . 
Substituting this into (C.fO) we find that: 

\q(x } p)\ < K 3 \x - c\\p- p \ 

for any (x,p) G I x X 7 p . This proves part (3) of the lemma. 

Lemma C.2.2 7et {/ p : I x — > I x \p G I p } be a one-parameter family of mappings sat- 
isfying (CO) and (CI). Suppose that f Po satisfies (CE1) for p G I p and some turning 
point, c G C(f Po ). Suppose that turning point c of f Po favors higher parameters. Given 
any A > Ai > f, there exist constants K > f , 6p > and e > such that for any e < e , 
if\p — po\ < Sp, \f l (c,p) — f(c,po)\ < e, and \f(c,p ) — c*| > Ke for all c* G C(f Po ) and 
I < i < n then: 

\D x (f l (c,p),p)\ ^ 



\D x (p(c } p ) } p )\ A 
for all 1 < i < n. 

Proof: We first describe possible choices for K > f , Sp > 0, and e > 0. We then show 
that these choices in fact satisfy (C.f f). 
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Fix 8x > such that 

Dlf(x,p ) ^z if \x — c*| < <*>£ 
for any c* G C(f Po ). Then let: 

J x = {x G Zrl I a; — c*| > 8x for any c* G C(f Po )}. 
Set Me = infa; e j x \D x f(x } p )\ and dehne: 

A(a) = sup sup \D p f(x,p) — D p (x,po)\. 

xelx pE[po~a,po + a] 

Now let K\ > 0, A^2 > 0, and A 3 > be the constants from lemma C.2.1. Choose: 



2K X 

KM 



K= ^ \ lV (C12) 



Note that since K\ > K 2} we know that K > 1. Choose 8p\ > such that: 

A(8 Pl ) < ^(1 - ±). (C.13) 

Let 8p 2 = g(l - |i) and set 

<5p = min{<5pi, <5p 2 }- (C.14) 

Finally, hx 

. ; Ma; Ai fe, , n , 

eo = nnn{ — (l --),-}. (C.15) 

In order to show (C.ll) it is sufficient to show: 

A(i,p,po)<l-^- (C.16) 

where 

A( . s \Dj(r(c,p),p)-Dj(r(c,po),po)\ 

A{i,p,po) = - . C.17 

\D x f{r{c } p ) } p )\ 

For each 1 < i < n we now consider two possibilities: 

(1) \f(c,p) — c*| > <*>£ for some c* G C(f Po ) 

(2) A"e < |/ 8 (c,p ) — c*| < 8x for some c* G C(f Po ). 
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(Note that we know Ke < 6x from (C.15).) 

From now on we assume that | p — p 1 < dp, \f l { c ,p) ~ f l { c ,Po)\ < e ? an d \f l { c ,Po) ~ 
c*| > Ke for all c* G C(f Po ) and 1 < i < n. We wish to show that (C.16) is true for both 
cases (1) and (2) above for each 1 < i < n. 

In case (1) using (C.13), (C. 14), (C.15), (C.17), and lemma C.2.1 we have: 

Afrnpo) < J (\D x f(f(c,p),p) - D x f(f(c }Po ) }P )\ 

+ \D x f(f(c,po),p) - D x f(f(c,p ),p )\) (C.18) 

< Kl ^ p \: r{c ^ ^(\p-p,\) 

< Kle ° Mx (l - b.) 

M x 2 [ X } 

< K L M ± A, 1 A, 

M x 2K t y X J 2 y X J 

which proves the lemma for case (1). 

In case (2), if Ke < |/ 8 (c,p ) — c*| < <5x, for some c* G C(f Po ) then from lemma C.2.1, 
(C.18), (C.15), and (C.12): 

Ar s ^ ^i|/' ( c ,p) -/'' (c, Po)| +K 3 \f (c,po) -c*\\p-po\ 
A{i,p,Po) < 



K 2 \f i (c,p ) -c m \ 



K l e , K 3 \p~ Po\ 



K 2 (Ke) K 2 

This proves the lemma. 

Lemma C.2.3 Suppose that there exist constants C > 0, iVo > and A > 1 such that 

\D p f(c }P o)\>CX (C.19) 

for all i > iVo where p G I v . Suppose also that there exists 6p > and Ai G (1, A ) such 
that for some n > No: 

\D,mc ) ,P)\ > >* (c . 20) 



\D x f(f(c) } p )\ A 
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for all 1 < i < n if \p — p \ < 6p. Then for any A 2 G (1, Ai), there exists N\ > 
(independent of n and 6p) and 8p\ > (independent of n) such that 

\D p f(c,p)\>CXl 

for all Ni < i < n + 1 (f |p — p 1 < &Pi ■ 

Proof: Given A > 1, fix 1 < A 2 < Ai < A . Set M p = sup^gj \D p f(x } p )\ and define: 

z{i) = (^ - 1)C A 2 +1 - M p ^(^y +1 - 2M P (C.21) 

Ao M Ao 

It is apparent that z(i) — > oo as i — > oo. Thus, it is possible to choose N 2 > (indepen- 
dent of n and <5p) so that z(i) > Ko\I p \ for all i > N 2 where K > is the constant from 
lemma C.2.1 such that: 

\D P f(x,p) - D p f(x,po)\ <K \p-po\ 

for any x G I x and p G I v . Let N\ = maxjiVo, N 2 }. 

We now prove the lemma by induction on i for Ni < i < n. From (C.19), and since 
|D p / 8 (c, p)\ is continuous with respect to p } there exists 6p 2 > such that 

\D p f Nl (c,p)\>C\* (C.22) 

if \p — p \ < 6p 2 . Set Spi = min{Sp } 6p 2 }. Thus, since 8p\ > is independent of n, to 
prove the lemma it is sufficient to show that: 

\D p f(c,p)\ ^ f \ 2 



\D P P{c,p )\ A 
implies 



> (t 1 ) 8 (C23) 



D P f t+1 (c,p)\ f \2 v+ i 



\D p f+i(c }Po )\ v A y 
for any |p — j9 1 < &Pi if N\ < i < n. 

Let E = \Dp};+i(c-po)\ and let A = \ D ^f(f t ( c ^Po)^Po)D p (c,p )\. Then, expanding by 
the chain rule: 

E ^ \D p f+\c } p)\ 



D p f^(c, Po ) 



\DJ(r(c,p),p)D P f l (c,p)\ - \D p f(r(c,p),p)\ 
\D x f(f(c }Po ) } p )D p f(c } p )\ + \D p f(f(c } p ) }Po )\ 
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Using (C.20) and (C.23): 

\D x f(f i (c,p),p)D p f(c,p)\ 

= ^\D x f(r(c,p ),p )\(^y\D P r(c,p )\ 

= (¥r +1 ^A (C.25) 

Also, we know for lemma C.2.1 that there exists K > such that: 

\D p f(f(c,p),p)\ 

< \D p f(f(c,p),p) - D p f(f(c,p), Po )\ + \D p f(f(c,p), Po ) - D p f(r(c,p ),p )\ 

< Ko\p-p \ + 2M p (C.26) 

Thus, substituting (C.25) and (C.26) into (C.24): 
[^±A-(K \p-p \ + 2M p ) 



E> 



A + M p 
A 2v+1 ^ (iY +1 (^ - l)A-(K \p- Po \ + 2M p )-M p (^ 



> (fj +1+ ° " A + M P ° • < C " 27 ) 

Since \D p f t+1 (c } p )\ < A + M v and from (C.19) we have that 

A > C Xo +1 - M p (C.28) 

Substituting (C.28) into (C.27) and from (C.21) we have: 

A 2 v+1 ^ (t ~ l )Co X 2 +1 - M pT 2 (t ) 1+1 ~ 2M p ~ K o\P ~ Pol 



E> (—) t+1 + 

V A + M p 

> (^i) 8+1 + z ( l ) ~ K °\P ~ P°\ 



Xo A + M p 

Since z(i) > Ko\p — Po\ } for i > N\ } we have that: 

if Ni < i < n which proves the lemma. 

Lemma C.2.4 Let {f p : I x — > I x \p £ I p } be a one-parameter family of mappings satis- 
fying (CO) and (CI). Suppose that f Po satisfies (CE1) and (CP1) for p £ I p and some 
c £ C(f Po ). Then there exist constants e > 0, K > 1, Ni > 0, A > 1, and Sp > such 
that for any positive e < e , if p £ B(p 0} 6p) then for any n < n e (c, e } p ) the following 
two conditions are true: 
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(1) If\f l (c,p) — / 8 (c,p )| < e for every 1 < i < n, £/zen 

|Z> p / j (c,p)|>CA' 
/or any iVi < j < n + 1. 

max |/ 8 (c,p) -/ 8 (c,p )| > min{e, CA'|p - p \}. 

JVl <«<n 

Proof: If f(x,po) for c G C(f Po ) then there exists C > 0, iVo > 0, and A > such that: 

IAJ 8 (c,Po)| > CA* 

for all i > iVo. Choose A and Ai such that 1 < A < Ai < A . Then from lemma C.2.2 
we know that there exists K > 1, 8p\ > 0, and t\ > such that for any e < ei, if 
p G B(p 0} Spi), n < n e (c, A e, p ) } and |/ 8 (c, p) — / 8 (c, p )| < e for 1 < * < n ? then: 

lW(c,p),p)| < A x 



|^(/ 8 (c,Po),Po)| A 



for any 1 < i < n. From lemma C.2.3, this implies that there exists e > 0, 6p 2 > 0, 
and Ni > such that for any e < e , if p G B(p 0} 6p 2 ) and |/ 8 (c,p) — / 8 (c,p )| < e for 
1 < z < n, then: 



") 



|AP(c,p)|>CA J (C.29) 

for any j satisfying Ni < j < n + 1, provided that n < n e (c, Ke } p ). This proves part 
(1) of the lemma. It also implies that 

\f(c,p)-r(c,p )\>cy\p-po\ (c.30) 

for any Ni < z < n + 1 if n < n e (c, Ke } p ). 
Now dehne: 

fl'(p) = max |/'(c,p) - .f ( c ,Po)| 

l<8<iVl 

for any p G I p . Since f(x,p) is C 2 and ID^/^ 1 (c, p )| > CAq 1 , there exists Sp 3 > such 
that g(p) is monotonically increasing in the interval \po,Po + Sp 3 ] and monotonically 
decreasing in the interval [p — Sp 3} p ]. Choose Sp = min{6p 2} 6p 3 }. 

Now fixe < e . For each n > 0, dehne J n to be the largest connected interval such that 
p G J n implies that \f l (c,p) — f l (c,p )\ < e for 1 < 1 < n, po G J n , and J n C B(p ,Sp). 
In order to prove part (2) of the lemma it is sufficient to show that for any p G B(p 0} Sp) 
if Ni < n < n e (c, Ke } p ) } then either (a) p £ J n which implies |/ 8 (c,p) — / 8 (c,p )| > CA 8 
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for all iVo < i < n or (b) p $. J n which implies that |/ 8 (c, p) — / 8 (c, p )\ > e for some 
Ni < i < n. Case (a) has already been proved above (see (C.30)). We now prove case 
(b). 

First of all note that by our choice of Sp and J n} if p G B(p 0} 6p) } then either p G J/Vi 
or |/ 8 (c,p) — / 8 (c,p )| > e for some 1 < i < iVi. Now hx pi G B(p 0} 6) and suppose 
that pi G - J n , for some n satisfying Ni < n < n e (c, Ke } p ). Then, since J 8 - D J 8+ i for 
all i > Ni } we know that if there exists k < n such that pi G Jfc \ Jfc+i where iVi < 
k < n e (c,Ke,p ). But for any p G Jfc we know (see (C.29)) that |D/ fc+1 (c,p)| > CX k+1 . 
Thus (f k+1 (c,p) — f k+1 (c,po)) must be monotone with for all p G </&. Consequently if 
Pi G J fc \ Jfc+i then |/ fc+1 (c,pi) - f k+1 (c,p )\ > e where TVi < k < n e (c,Ke,po). This 
proves the lemma. 

Lemma C.2.5 Let {f p : T^ — > I x \p G / p } &e a one-parameter family of mappings satis- 
fying (CO) and (CI). Suppose that f Po satisfies (CE1) for some p G I p and c G C(f Po ). 
For any p G I p and n > define: 

V n (p,e) = {x G I x \ \f(x,p) - f(c,p )\ < e, for all < i < n} 

Then there exists e > such that for any positive e < e , and any 1 < n < n e (c, e, p ) : 

sup {a n (c,p )f n (x,p)} < cr n (c,p )f n (c,p). (C.31) 

xEV n ( P ,e) 

Proof: Proof by induction. Suppose that the elements of C(f Po ) are Ci < c 2 < . . . < c m , 
for some ra > 1. Assume that 

e < min |c 8+ i - c;| 

iG{l,2,... ,m-l} 

In this case, (C.31) clearly holds for n = 1 since o"i(c, p ) = 1 implies that c is relative 
maximum of f Po and o"i(c, p ) = — 1 implies that c is relative minimum of f Po . Now 
assuming that (C.31) holds for some n = k where 1 < k < n e (c, e, p ), we need to show 
that (C.31) holds for n = k + 1. 

Since A; < n e (e), \f k (c,p ) — c 8 | > e for any i G {1,2,... , m}. Consequently, since 
|/ fc (x,p) — / fc (c, po)\ < e for any x G T4(p, e), we see that there exists i G {1,2, ... , m — 1} 
such that c 8 - < x < c 8+ i for every x G T4(p, e). In other words, all elements of Vk(p,t) 
must lie on one monotone branch of f p and: 

sgn{Df(f k (x,p),p)} = sgn{Df(f k (c,p ),p )} (C.32) 

for all x G Vk(p, e). 

From our specification of crk(c } p ) we have that: 

a k+1 (c,p ) = sgn{Df(f k (c,p ),p )}a k (c,p ). (C.33) 
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We can consider four cases: sgn{Df(f k (c,po),po)} = ±1 and <Jk(c,Po) = il- Suppose 
that <Tfc(c, po) = 1- By assumption, if <7fc(c, p ) = 1, then 

sup P(x,p)<P(c,p). (C.34) 

Thus, if sgn{Df(f k (c,po),po)} = 1, then, from (C.33), (Tk+i(c,Po) = 1- Also, from 
(C.32), we know that sgn{Df(f k (x,p),p)} = 1 for all x £ T4(p, e), and we know that 
all elements of Vk(p, e) lie on a monotonically increasing branch of / p . Combining this 
result with (C.34) implies that: 

sup f k+1 (x,p)<f k+1 (c,p). 

xeV k + 1 (p,e) 

On the other hand, if sgn{Df(f k (c,po),po)} = — 1, then ak+i(c,po) = — 1 and 

inf f k+1 (x,p) >f k+1 (c,p). 

xeV k + 1 (p,e) 

In both cases above we can see that (C.31) is satisfied for n = k + 1. Similarly we can 
verify that (C.31) is also satisfied for n = k + 1 in the two cases where crk(c } p ) = — 1. 
This proves the lemma. 

Proof of theorem 3. 3. 1 : 

We are given that f Po satisfies (CE1) for some p £ I p and c £ C(f Po ). Then, from 
part (1) of lemma C.2.4, there exist constants K > 1, C > 0, N 2 > 0, e > 0, Sp > 0, 
and A > 1 such that for any e < e , if p £ B(p 0} 6p) } and |/ 8 (c, p) — / 8 (c, p )\ < e for all 
i satisfying 1 < i < n — 1, then: 

|Z> p r(c,p)|>CA" (C.35) 

for any n such that N 2 < n < n e (c, Ke } p ). 

Now suppose that there exists c £ C(f Po ) that favors higher parameters. Then there 
exists iV3 > such that for any n > N 3 : 

sgn{D p f n (c,p )} = a n (c,p ). (C.36) 

Set Ni = max{N 2} N 3 }. From (C.35) and since / is C 2 it is clear that D p f n (c } p) can 
not change signs for any p £ B(p 0} 6p) if N 2 < n < n e (c, Ke } p ). Consequently, from 
(C.36) we have that: 

sgn{D p f n (c,p)} = cr n (c,p ) 

for any Ni < n < n e (c, Ke } p ) if p £ B(p 0} Sp) and |/ 8 (c, p) — / 8 (c, p )| < e f° r 1 < « < 
n — 1. In this case: 

sgn{f n (c,p) - f n (c,p )} = a n (c,p )sgn{p - p }. (C.37) 
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Now suppose that p < p . Then from (C.37) if cr n (c } p ) = 1, then f n (c } p) < f n (c } p ) 
and if cr n (c } p ) = — 1, then f n (c } p) > f n (c } p ) for any p G B(p 0} 6p) such that |/ 8 (c, p) — 
/ 8 (c,p )| < e for 1 < i < n — 1, provided that iVi < n < n e (c, Ke } p ). Combining this 
result with lemma C.2.5 we hnd that: 

sup f n {x,p) < f n (c,p ) if a n (c,p ) = 1 

a;GVn(p,e) 



inf f n (x } p) > f n (c } p ) if cr n (c } p ) 



which implies that 



inf \P(x,p) - P(c,po)\>\f n (c,p) - P(c,po)\ (C.38) 

xeV„(p,e) 

for any p G [po — Sp } p ] } if iVi < n < n e (c, Ke } p ) (where T4(p, e) is as dehned in the 
statement of lemma C.2.5). 

Finally, from lemma C.2.4 we also know that 

max \f i (c,p)-f i (c,p )\>Tmn{e,CX i \p-p \}. (C.39) 

JVl <«<n 

if Ni < n < n e (c, Ke } p ) and p G B(p 0} 6p). Combining (C.38) and (C.39) we hnd that: 
inf \r(x,p)-r(c,p )\ >min{e,Cy\p- Po \}. (C.40) 

xeV„(p,e) 

if Ni < n < n e (c, Ke } p ) and p G [po — Sp } p ]. Clearly the orbit {/ 8 (c, Po)}°l cannot be 
e— shadowed by an orbit of f p if 

inf \P(x,p)-P(c,p )\>e (C.41) 

xev n (p,t) 

for any hnite value of n. Consequently from (C.40) and (C.41) we see that for any e < e , 
the orbit, {/ 8 (c, Po)}°l 0} cannot be e-shadowed by f p if 

\p- Po \ >^e\~ n ^) (C.42) 

and p G [po — $P,Po]- Setting K' = ^, this proves the theorem. 
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Appendix D 



Proof of theorem 3.3.2 



This appendix contains the proof for theorem 3.3.2. I have made an effort to make the 
appendix as self-contained as possible, so that the reader should be able to find most of 
the relevant definitions and explanations in this appendix. Naturally, this means that 
the appendix repeats some material found elsewhere in this report. 



D.l Definitions and statement of theorem 

Definition: Suppose that g : 7 — )• 7 is C 3 and /C t. Then the Schwarzian derivative, 
Sg, of g is given by the following: 



Sg(c 



g'"(x) 3V'(*K 2 



g'(x) 2 g'(x) 
where g'(x) } g"(x) } g l "(x) here indicate the first, second, and third derivatives of x. 

In this section we will primarily restrict ourselves to mappings with the following 
properties: 

(AO) g : 7 -> 7, is C 3 (7) where 7 = [0, 1], with g(0) = and g(l) = 0. 

(Af) g has one local maximum at x = c; g is strictly increasing on [0,c] and strictly 
decreasing on [c, 1]; 

(A2) g"(c)<0, \g'(0)\ > 1. 

(A3) The Schwarzian derivative of g is negative, Sg(x) < 0, over all x £ 7 (we allow 

Sg(x) = — oo). 

Under the Collet-Eckmann conditions, there exist constants Ke > and Xe > I 
such that for some c £ C(g): 
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(CE1) \Dg n (g(c))\>K E \ E 

(CE2) \Dg n (z)\ > K E \ E if g n (z) = c. 

for any n > 0. 

We will be investigating one- parameter families of mappings, / : I x X I p — >■ I X} 
where p is the parameter and I X} I p cK are closed intervals. Let f p (x) = f(x } p) where 
fp'-Ix^ Ix- We are primarily be interested in one-parameter families of maps with the 
following characteristics: 

(BO) For eachp £ I p , f p : I x -> I x satisfies (A0), (Af), (A2), and (A3) where I x = [0, 1]. 
For each p } we also require that f p has a turning point at c, where c is constant 
with respect to p. 

(Bf) f : I x X I p ^ I x is C 2 for all (z,p) £ I x X I p . 

Another concept we shall need is that of the kneading invariant. Kneading invariants 
and many associated topics are discussed in Milnor and Thurston [34]. 

Definition: If g : I ^ I is a piecewise monotone map with exactly one turning point 
at c, then the kneading invariant, D(g,t), of g is defined as follows: 

D(g, t) = l + e^t + 9 2 (g)t + ... + 9 n (g)t n + ... 



where 

Qn(g) = ei(flf)e 2 (flf)...e„(flf) 
e n (g) = lim i sgn(Dg(g n (x))) 

for n > 1. If c is a relative maximum of g, then one interpretation of 6 n (g) is that it 
represents whether g n+1 has a relative maximum (0 n (g) = +1) or minimum (0 n (g) = — 1) 
at c. 

We can also order these kneading invariants in the following way. We will say that 
\D(g,t)\ < \D(h,t)\ if 9 l {g) = 9 l {h), for I < i < n, but 8 n (g) < 6 n (h). A kneading 
invariant, D(f p ,t), is said to be monotonically decreasing with respect to p if pi > p 
implies \D(f Pl ,t)\ < \D(f Po ,t)\. 

We are now ready to state the main result of this appendix: 

Theorem 3.3.2 Let {f p : I x — > I x \p £ I p } be a one-parameter family of mappings 
satisfying (BO) and (Bl). Suppose that p £ int(I p ) 1 such that f Po satisfies (CE1). 



1 Henceforth, if A C M, let int(A) denote the interior of A. 
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Also, suppose that the kneading invariant, D(f p ,t), is monotonically decreasing with 
respect to p in some neighborhood of p = p . Then there exists Sp > and C > such 
that for every x £ I x there is a set, W(x ) C I x X I p , satisfying the following conditions: 

(1) W(x ) = {(a X0 (t)J X0 (t))\t £ [0,1]} where a X0 : [0,1] -> I x and {3 Xo : [0,1] -> J p are 
continuous and f3 Xo (t) is monotonically increasing with respect to t with f3 Xo (0) = po 
and f3 X0 (l) = p + Sp. 

(2) For any x £ I x , if(x,p) £ V^(x ) then \f n (x,p) — f n (x ,po)\ < C(p — po)~ for all 
n > 0. 



D.2 Tools for maps with negative Schwarzian deriva- 
tive 

There has been a significant amount of interest in recent years into one-dimensional 
maps, particularly maps with negative Schwarzian derivative. Below we state some 
useful properties and analytical tools that have been developed to analyze these maps. 
For the most part, the results are only stated here, and references provided to appropriate 
proofs. We do not attempt to trace the history of the development of these results. 

The only results in this section that are new are contained in lemmas D.2. 11, D.2. 12, 
and D.2. 13. 

Lemma D.2.1 If g satisfies (A0), (Al), and (A2) then there exist constants K > 0, 
and K\ > such that for all x £ I : 

(1) K Q \x — c\ < 1-0^(^)1 < Ki\x — c\ 

(2) \Kq\x - c\ 2 < \g(x) - g(c)\ < ^K^x - c\ 2 

Proof: This is clear, since g"(c) ^ 0. 

Lemma D.2. 2 If f(x } p) satisfies (B0) and (Bl), then there exist constants K > 0, 
and K\ > such that for any x £ I x , y £ I X} p £ I p} and pi £ I p : 

(1) \Dxf(x,po) - D x f(y,po)\ < K \x - y\ 

(2) \D x f(x,po) - D x f(x,pi)\ < K^po -pi\ 

Proof: This is clear, since f(x } p) is C 2 and I x X I p is compact. 
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Lemma D.2.3 (Minimum Principle) . Suppose that g has negative Schwarzian deriva- 
tive. Let J = [x 0} Xi] be an interval on which g is monotone. Then 

\Dg(x)\>min{\Df(x )\, \Df( Xl )\} 

for all x G J- 

Proof: See, for example, page 154 of [33]. 

Definition: Given map </:/—)•/, we say that x is in the basin of attraction of an orbit, 
{j/ 8 }°^ , of g if there exists an m > such that limi^ co (g t+m (x) — j/ 8 ) = 0. 

Lemma D. 2. 4 (Singer) If g : I — >■ I is C 3 and has negative Schwarzian derivative, 
then the basin of attraction of any stable periodic orbit contains either a critical point 
or one of the boundary points of I. 

Proof: See Singer [58]. 

Definition D.2.1 We will say that a piecewise monotone map, g : I — >■ / , has a sink 
if there exists an interval J C I such that that g is monotone on J n and g n (J) C J for 
some n > 0. 

Lemma D.2.5 If g : I -> I satisfies (A0), (Al), (A2), (A3), and (CE1). Then g has 
no sinks. 

Proof: It is relatively simple to show that the existence of such a sink implies the 
existence of a stable periodic point (see for example Collet and Eckmann [14], lemma 
II. 5.1). From Singer's theorem, we know that g : [0,1] — > [0,1] does not have a stable 
periodic orbit unless x = 0, x = c, or x = 1 is in the basin of attraction of that periodic 
orbit. From (CE1) we know that the critical point does not tend to a stable orbit and 
from (A2) we know that x = and x = 1 do not tend to a stable periodic orbit. Thus 
g has no sinks. 

Lemma D.2.6 (Koebe Inequality) . Suppose that g : I — >■ I has negative Schwarzian 
derivative. let T = [a, b] be an interval on which g is a cliff eomorphism. Given x G T, 
let L and R be the components of T \ {x}. If there exists r > such that: 

\g(L)\ \g(R)\ 

> t and - — — — - > r 



\9(T)\ ~ \g(T)\ 

then there exists K(t) > such that: 

\Dg(x)\ >K(T)sup\Dg(z) 

zeT 

where K(t) depends only on r. 
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Proof: See, for example, theorem 3.2 in van Strien [60]. 

Lemma D.2.7 Let g : I -> I satisfy (A0), (Al), (A2), (A3) and (CE1). Then g 
satisfies (CE2). 

Proof: See Nowicki [44]. 

Lemma D.2.8 Let g : I -> I satisfy (A0), (Al), (A2), (A3) and (CE1). There exists 
K > and Ai > 1 such that for any n > 0, if g n (x) = c then \x — c\ > KX^ n . 

Proof: From lemmaD.2.1, we know there exists K > such that \Dg(x)\ < K \x — c\ 
for any x £ /. Now set a = sup^gj \Dg(x)\. Then we have: 

\Dg n (x)\ <a n - 1 K \x-c\ 

However, by lemma D.2.7, we also know that g satisfies (CE2), so that Dg n (x) > 
Ke^ 1 for some constants Ke > and A > f . Thus a n ~ 1 K \x — c\ < Ke^ 1 which implies 
that \x - c\ < ^§f{^) n . This proves the lemma if we set K = ^ and A x = (£). 

Lemma D.2.9 Let g : I -> I satisfy (A0), (Al), (A2), (A3) and (CE1). Let J n C of I 
be any interval such that g n is monotone on J n . Then there exist constants K > and 
A 2 > f such that for any n > 0: 

\Jn\<K\2 n 

Proof: See Nowicki [44]. 

Lemma D.2.10 Let g : I -> I satisfy (A0), (Al), (A2), (A3) and (CE1). Suppose that 
g n is monotone on J = [a, b] where J C I and g n (a) = c for some n > 0. Then there 
exist a constant, K > 0, such that for any n > 0: 

W)\ > K 



\J\ 

Proof: See lemma 6.2 in Nowicki [45]. 

Lemma D.2.11 Suppose that g : I ->■ I satisfies (A0), (Al), (A2), (A3), and (CE1). 
Let x £ I such that \g l (x) — c\ > e for < i < n. Then, for any e > there exist 
constants C > and A > f (independent of x) such that: 

\Dg l {x)\ > Ce 2 \ l 

for < i < n. 
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Proof: For any i > 0, let A 8 (x) be the maximal interval such that x £ A 8 (x) and g 8 is 
monotone on A 8 (x). The proof of the lemma is based on the following claim: 

Claim: Let x £ /, and suppose that there exists b £ A n (x) such that g n (b) = c for some 
n > 0. If | <?'(;£ ) — c\ > e for < i < n, then there exist Co > and A > 1 (independent 
of x) such that: 

\Dg n+1 (x)\ >C e 2 X n+1 . 

We shall now describe the proof of the lemma using this claim, leaving the proof of 
the claim for later. 

Fix x £ I and i < n. Suppose that A 8 (x) = [a } a'] and let X{ = f l (x), a; = /*(«), 
and a' = f l (a'). For dehniteness, assume that |x 8 - — a 8 | < \a\ — Xi\ (the other case is 
analogous). Since A 8 (x) is maximal, each endpoint of A 8 (x) must map either into (1) 
the critical point, or (2) into the boundary of /. If case (2) is true, there must exists 
k < i such that g k (a) = 0, or g k (a) = 1 (since I = [0,1] by (A2)). This means either 
a = 0, a = 1 or g 3 (a) = c for some j < k. If g 3 (a) = c then case (1) is also satisfied. 
Otherwise, if a = or a = 1, then / 8 (A 8 (x)) D {c} ^ 0, and the lemma may be proved 
by a direct application of the claim described above. 

Otherwise, if case (1) is true, there must exist k < i such that g k (a) = c. By (CE1), 
we know there exist constants, Ke > and A# > 1 (independent of i and k) such that: 

\Dg i - k -\g k+ \a))\>K E X£ k - 1 (D.l) 

Now set y £ [a, a'] so that j/ 8 - = g l (y) = |(a 8 - + a'). By the Koebe Inequality, since 
\yk ~ a k\ < \d'k ~ Vk\i there exists K = K(t = |) > such that: 

\Dg { - k -\g k+ \y))\ > K \Dr k -\g k+ \a))\ 
Combining this with (D.l) we have: 

W-*"V +1 (y))l > KoKeXe-*- 1 (D.2) 

Also, since \xi — a 8 | < \a\ — x 8 |, we know X{ £ [a,-; j/ 8 ] (where [a; b] means either [a, b] or 
[6, a] whichever is appropriate). Thus by using the minimum principle with (D.l) and 
(D.2) we find that there exists Ki > such that: 

IW-"- 1 ^ 1 ^))^^^^- 1 . (D.3) 

We are now ready to apply the claim. It is clear that a £ Ak(x). Since g k (a) = c, 
the claim implies that there exists Co > 0, and A > 1 such that: 

| J D/ +1 (x)|>C oe 2 A fc+1 (D.4) 
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Figure D.l: The interval g k (A(x)) = [a k ,a' k ] and associated variables are shown. The figure 
is drawn assuming that a' k > a k , b £ (a, a'), and that x £ [a, b]. 
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Applying the minimum principle to this interval and using (D.5) and (D.6), we find 
that there exists A 3 > such that: 

\Dg k (y')\>K 3 X k E . (D.7) 

Also, for any e > 0, we know from lemma D.2.1 that there exists A 4 > such that 

\Dg(y' k )\ > \k a c. (d.8) 

From (D.7) and (D.8) and setting K 5 = |A^ 3 A^ 4 , we have: 

\Dg k+ \y')\>K 5 eX k E +1 . (D.9) 

Also, since g k (a) = c, from (CE1) we know that \Dg n ~ k - 1 {g k+1 {a))\ > K E X n E k - x . 
Since g n (b) = c, we know from (CE2) that \Dg n ~ k - 1 {g k+1 {b))\ > K E X n E k - x . Thus, by 
the minimum principle, \Dg n ~ k ~ 1 (g k+1 (y'))\ > K E X E ~ ~ . Combining this with (D.9) we 
find: 

\Dg n {y')\> KsK E e\ n E . (D.10) 

From (CE2) we also know that 

\Dg n (b)\>K E X E . (D.ll) 

In addition, since \xk — a,k\ > e, we know that x k G [y 1 ^ bk] so that x G [y' } b\. Thus, 
from the (D.10), (D.ll), and the minimum principle, we can conclude that there exists 
Kq > such that: 

\Dg n (x)\>K 6 eX n E . 

Finally, since \g n (x) — c\ > e, we can use lemma D.2.1 to bound \Dg(g n (x))\ < K±e for 
A^ 4 > 0. Consequently there exists C\ > such that: 

\Dg n+1 {x)\>C 1 e 2 X E (D.12) 

which proves the claim for the case where g k (a) = c for some k < n. 

The other possibility is that g k (a) G Bd(I) for some k < n where Bd(I) denotes 
the boundary of I. But this implies that either a G Bd(I) or possibly that g k ~ 1 (a) = c. 
The possibility where g k ~ 1 (a) = c has already been covered by the previous case. On 
the other hand, if a G Bd(I) then by (A2) there exists A > 1 such that \Dg n (a)\ > Aq. 
From (CE2) we also know that \Dg n (b)\ > K E X E . Thus, by the minimum principle, 
there exists K7 > and Ai > such that \Dg n (x)\ > K^X^ for any x G [a, b\. Then, 
since \g n (x) — c\ > e we can use lemma D.2.1 to bound \Dg(g n (x))\ so that there exists 
C2 > satisfying: 

\Dg n+1 (x)\ > C 2 eX\ (D.13) 

Combining (D.12) and (D.13) shows that we can pick C > and A > 1 to prove the 
claim. 

155 



Lemma D.2.12 Let g : I -> I satisfy (AO), (Al), (A2), (A3), and (CE1). Suppose 
there exists a £ I and n > such that g n (a) = c. Given any a > sufficiently small, 
either min < 8 '<„ \g l (a) — c\ > a or there exists b £ /, n' > 0, and constants K > and 
A' > such that g n (b) = c, \b — a\ < Ka, and n' < n — K'log -. 

Proof: Suppose that mino< 8 <„ \g l (a) — c\ < a. Then there exists m < n such that 
\g m (a) — c\ < a and |</ 8 (a) — c\ > a for < i < m. 

Since g m (yo) approaches close to c, we can bound m away from n using lemma D.2.8: 

log L , 

n-m>T-?-2- D. 14 

logAi 
where Ai > 1 is a constant dependent only on g. 

We now consider two possibilities: (1) there exists b £ I such that g m (b) = c and g m 
is monotone on [a; b] or (2) there exists b £ I and k < m such that g m is monotone on 
[a; b], g k (b) = c, and g m (b) £ [g m (a); c\. One of these two cases must be true. 

Let ai = g l (a) and h{ = g l (b) for i > 0. In the hrst case, from lemma D.2.10, there 
exists K 3 > such that: 

1 a 

\b~ a\ < —\b m - c\ < —. (D.15) 

A 3 A 3 

Also, from (D.14) we know m < n — lo A a . Thus, in this case the lemma is proved if we 
set K = t^, K' = , x . and n' = m. 

Now we address the second case. From lemma D.2.1 we know there exists K > 
and K 1 > such that K \x — c\ 2 < \f(x) — /(c) | < K x \x — c\ 2 . Thus if we set K 2 = ^ L 
we see that for any 8 > and 8* > K 2 8 we have that: 



g([c±8;c\) C g([c;c±8*]) (D.16) 

where the ± notation means that the relation holds for all four possible combinations. 
Also note that since bk = c and b m £ [a m ; c] we have: 

[a k+ i;b k+ i] = g([a k ;b k ]) = g([a k ;c\) (D.17) 

[a m+ i;& m+ i] = g([a m ;b m ]) C g([a m ;c\). (D.18) 

We now assert that \a,k — bk\ < K^a. Suppose to the contrary that |a^ — c| = \a^ — bk\ > 
K 2 ct > K 2 \a im — c\. Then, combining this with (D.16), (D.17), and (D.18) implies that: 

[a m+1 ;b m+1 ] C[a k+ i;b k+ i]. (D.19) 
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However, since g satisfies (CE1), it cannot have any sinks (from lemma D.2.5). In 
particular this means: 

if k < m since g m+1 is monotone on [a; b] if a > is sufficiently small. Thus, (D.19) 
cannot be true so we conclude that: 

Wk - h\ < K 2 a. 

Finally, since bk = c, we can use D.2.10 to show that there exists 7l 3 > such that: 

\b - a\ < —\a k - b k \ = —K 2 a (D.20) 

A3 A 3 

Thus combining (D.14) and (D.20) we see that the lemma is satisfied if we set K = j^., 

K' = r^-i- and n' = k<m<n — ^f-. 

logAi — log A! 

Thus, combining the results from (D.I5) and (D.20), proves the lemma. 

Lemma D.2.13 Suppose g : 7 -> I satisfies (AO), (Al), (A2), (A3), and (CE1). 
Then there exists C > and e > so that given any positive e < e , and any x £ 
I such that x + e £ 7, then there is a y £ (x } x + e) such that N(y } g) < 00 and 
rnin < 8 <jv(j /i3 ) \g l (y) — c\ > Ce. Similarly if x — e £ 7, then there exists y' £ (x — e, x) such 
that N(y',g) < 00 and min 0<8< jv(j / ', 3 ) \g l {y) — c\ > Ce. 

Proof: We show the proof for j/ £ (x } x + e). The proof for y' £ (x — e,x) is exactly 
analagous. 

Our plan is to apply lemma D.2.I2 as many times as necessary to find an appropriate 
y to satisfy the lemma. In other words, lemma D.2.I2 implies that given any j/ 8 - £ 7 such 
that rii = N(yi 7 g) < 00 and min < 8 '<„ 8 \g l {]Ji) — c\ > a, then there exists a j/ 8+ i £ 7 such 
that Ij/i+i — Vi\ < Kct and 

n i+1 = N(y l+1 ,g) < n % - K' - (D.2I) 

a 

for positive constants K and K'. Thus given y 0} we can generate a sequence {yi}\Z_Q in 
this manner for increasing i until i = m such that 

min \g\y m ) — c\ > a. (D.22) 

0<i<n m 

For example, given any a > 0, and any x £ 7 we know from lemma D.2.9 that if 
Xo + a £ 7, then there exists j/ £ (^0,^0 + ct) such that g n °(yo) = c for some integer 
satisfying: 

log L , 

n < -^ + 1 (D.23) 

logA 2 
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where A 2 > is a constant dependent only on g. If we generate {yi}\ = Q from the y 
specihed above, then from (D.21) and (D.23) we find that: 

rii < (t-^- - iK')(\og -) + I (D.24) 

log A 2 a 

for all < i < m. Set M = K ,^ A — \- 1. Then for sufficiently small a > we find that 
m < M because otherwise (D.24) would imply that n 8 - < for i > m. 

So given x £ I and positive e < e from the statement of the lemma, set x = 
x + KM a and a = jkW+i 6 - Note that we can choose e > to insure that a > is 
sufficiently small so that the above arguments work. Also, note that since x + a = 
x + 2*km+i e < x -\- e, ifx + e£/ then x + a £ I. From our choice of y £ (x , x + a), 
we also know that since |j/ 8 +i — Hi\ < Ka, we have \y m — y \ < Kma. Consequently 
y m > x + KMa — Kma > x and y m > x + KMa + a + Kma > x + (2KM -\-l)a < x + e. 
Thus j/ m £ (x,x + e) and from (D.22), we have that min < 8 '<„ m |<7'(j/ m ) — c | > a = Ce 
where C = 2 t'm+i • Setting y = y m , this proves the lemma. 



D.3 Analyzing preimages 

In this section we will investigate one-parameter family of mappings, {f p \p £ I p }, that 
satisfy (BO) and (Bl). Our discussion depends on an examination of the preimages of 
the critical point, x = c in I x X I p space. We hrst need to introduce some notation in 
order to describe the relevant concepts. 

For the remainder of this section, {f p \p £ I p } will refer to a given one-parameter 
family of mappings satisfying (BO) and (Bl). We will consider the set of preimages, 
P(n) £ I x X I p satisfying: 

P{n) = {(x } p)\f t (x } p) = c for some < i < n}. 

First of all, it will be useful to have a way of specifying particular "sections" of 
preimages, i?(n, x 0} p ) } extending from a particular point (x 0} p ) £ I x X I p . So let 
R(n } x 0} p ) C I x X I p denote the set of path-connected elements, consisting of all points 
(x',p') £ I x X Ip such that there exists a continuous function g : I p — )• I x satisfying 
g(po) = x , g(p') = x', and 

{(x,p)\x = g( X0 , P0 )(p),p £ [po;p]} C P{n). 

where [po;p'] niay denote either [po 5 p'] or [p' 5 Po] 5 whichever is appropriate. 

A roadmap of the development in this section is as follows. In lemma D.3.1 we show 
that P(n) cannot have isolated points or curve segments. Instead, each point in P(n) 
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must be part of a path-connected set of points in P(n) that stretches for the length of the 
parameter space, I p . In lemma D.3.2 we demonstrate that if the kneading invariant of f p} 
D(f p ,t), is monotonically decreasing (or increasing), then P(n) must have a branching 
tree-like structure. As we travel along one direction in parameter space, branches of P(n) 
must either always merge or always split away from each other. For example if D(f P7 t) 
is monotonically decreasing, then branches of P(n) can only split away from each other 
as we increase the parameter p. Thus in this case, i?(n,j/_,p ) and R(n } y +} p ) cannot 
intersect each other for p > p if y + ^ y_ , and y +} y_ £ I x . 

In lemmas D.3.3,D.3.4, D.3.5, and D.3.6 we develop bounds on the derivatives for 
differentiable branches of R(n } x } p ). The basic idea behind lemma D.3.7 is that we 
can use these bounds to demonstrate that for maps, f p} with kneading invariants that 
decrease monotonically in parameter space, there exist constants C > and Sp > such 
that if x £ I x and 

U(p) = {x\ \x - x \ < C(p-p )~} (D.25) 

for any p £ I p} then for any p' £ \po,Po + Sp] } there exists x' + £ U(p') such that 



\x 



+ ' 



p') £ R(n +} y +} p ) for some y + > x and n + > assuming that f n+ (y +} p ] 



Likewise there exists x' + £ U(p') such that (x'_ } p') £ i?(n_, j/_,p ) for some y_ < x and 
n_ > where f n ~(y_ } p ) = c. 

However, setting n = max{n + , n_}, since i?(n, y_,p ) and R(n } y +} p ) do not inter- 
sect each other for p > p and y_ ^ j/ + , we also know that for any y_ < y +} there is 
a region in I x X I p space bounded by i?(n, j/_,p ), R(n, y +} po) } and p > p . Given any 
Xq £ Ix-, take the limit of this region as y_ —■ Xq , y + — > Xq , and n — > oo. Call the 
resulting region S(x ). Observe that S(x ) is a connected set that is invariant under / 
and is nonempty for every parameter value p £ I p such that p > p . Thus since S(x ) is 
bounded from (D.25), there exists a set of points, S(x ) } in combined state and param- 
eter space that "shadow" any trajectory, {f Po (xo)}^- of f Po . Finally we observe that a 
subset of S(x ) can be represented by the form given for W(x ). 

We are now ready to examine these arguments more formally. 

Lemma D.3.1 Let {f p : I x — > I x \p £ I p } be a one-parameter family of mappings sat- 
isfying (BO) and (Bl). Suppose that x £ I x satisfies n = N(x 0} f Po ) < oo for some 
p £ int(I p ). Then the following statements hold true: 

(1) There exists a closed interval J p (x 0} p ) C I p , and a C 2 function h( XO:Po j : J p (y } p ) — > 
I x such that po £ int(J p (x 0} p )) } h yjPo (p ) = p , and f n (h yjPo (p) } p) = c for all 
p £ J p (y } po). Also, if J p (y } p ) = [a,b] then a is either an endpoint of I p or 
f t (hy jPo (a) } a) = c for some i < n, and similarly for b. 

(2) There exists a continuous function, g( Xo , Vo ) '■ I P — ^ I x such that g(x ,p )(Po) = x o an d 

{(x,p)\x = g(x ,po){p)iP e Ip} c p ( n )- 
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Proof: Suppose that f m °(x 0} p ) = c for m < n and / 8 (x ,po) 7^ c for < i < m . Then 
define the set S(x 0} p ) C I x X I p to be the maximal path-connected set satisfying the 
following conditions: 

(f) (x ,p ) G S(x ,p ) 

(2) (x,p) G S(x 0} y ) if p £ I p and f l (x,p) ^ c for every < i < m . 

Note that S(x 0} p ) must contain an open neighborhood around (x 0} p ) because of the 
continuity of /. 

Now let S(x 0} p )} be the closure of of S(x 0} p ) } define Q( Xo , Po )(p) = {x\(x } p) G 
S(xq,Po)}, and let 

J P (xo,Po) = l inf p, sup p] (D.26) 

(^,p)g5(^o,po) (^,p)e5(^ ,Po) 

We claim that Q( Xo , yo )(p) G I x must consist of a single connected interval for every 
p G J p (x 0} p ). Otherwise if there existed X\ < x 2 < x 3 such that X\ G Q( Xo , Vo ){p)^ 
x 2 $. Q(x ,p ){p)i an d x 3 G Q(x ,p )(p) then there would exist i < m such that c G 
[/ 8 (x ,p); / 8 (x 3 ,p)]. But since (xi,p) G 5(io,Po) and (x 3 ,p) G S(x 0} p ) there exists a 
connected path, {(x(t),p(t))\t G [0,1]} C S(x 0} p ) } joining (xi 7 p) and (x 3} p) } where 
where xit) : [0, 1] — > I x and pit) : [0, 1] — > I p are continuous functions. Along this path, 
f l (x(t),p(t)) is continuous and f l (x(t),p(t)) ^ c for any t G [0,1]. This contradicts the 
assertion that c G [/ 8 (x ,p); / 8 (x 3 ,p)] and proves the claim that Q( Xo , yo )(p) must consist 
of a single interval for all p G J p (x 0} p ). 

Returning to the proof of the lemma we find that, since (x,p) G S(x 0} p ) implies 
f l (x,p) ^ c for every < i < m , we know that f™°(x) must be strictly monotonic 
on Q( Xo ,y )(p) for each p G J p (x 0} p ). Thus for each p G \po,Pi) there is exactly one 
x G Q(x ,y )(p) such that f m °(x } p) = c. Consequently there exists a function h( XOtPo ) : 
I p ->■ I x such that f mo (h (xo , po) (p),p) = c and h (xo , po) (p) G Q( Xo , yo )(p) if P G J p (x ,p ). 
Furthermore, the function, h( XOtPo ), must be C 2 for p G int(J p (xo, po)) since f(x,p) is 
C 2 and f™°(x) is strictly monotonic in for x G Q( Xo ,y )(p)- Finally, from our choice of 
S(x ,p ) and h( XO}Po )(p), it is clear that (/j (:EOiPo )(p),p) G P{n) for all p G J p (x ,p )- This 
proves property (1) of the lemma. 

We now have to construct a continuous g( Xo , Po )(p) that is valid over the entire range 
of I p . Suppose that J p (x 0} y ) = [p_i,pi]. Let g( Xo , Po )(pi) = x\. From our specification 
of S(x 0} p ) it is clear that f 3 (xi,pi) = c for some j < m . Thus there exists m\ < 
m such that f mi (xi 7 pi) = c and / 8 (xi,pi) ^ c for < i < m\. Consequently, we 
can use the same arguments as before to consider the set S(xi,pi), and generate a 
continuous function, h( XltPl )(p) such that (h( XltPl )(p),p) G P(n) for all p G J p (xi 7 pi) 
where J p (xi } j/i) D [pi,£>2] f° r some p 2 > p\. This argument can be carried out repeatedly 
for m > mi > m 2 , . . . and so forth. However, since f m '(xi 7 pi) = c, we see that 
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sup(/p) G J p (xi 7 pi) for some i < n. Similarly we can also use the same arguments for 
p < po } working in the opposite direction in parameter space in order to successively 
generate (h^ x _ t:P _ t ^(p) } p) G P(n) for increasing values of i. Consequently, there exists 
—n < a < and < b < n such that I p = U b i=a J p (xi,pi). Now if we set h : I p —■ I x to be 

5^o,Po)(i>) = hxi,Pi)(P) if P G J P ( x i>Pi)> (D.27) 

we can see that g( Xo , Po )(p) is continuous since h^ Xt:Pt ^(p) is C 2 if p G int(J p (xi 7 pi)) } and 
h (x l ,p l ){Pi) = ^(^-i, W -i)fe) f° r all a < z < 6. Finally, since (/j (:EJiPJ )(p),p) G P(ra) for all 
a < i < b we see that g( Xo , Po )(p) has all the properties guaranteed by the lemma. 

Lemma D.3.2 Let {f p : I x — > I x \p G I p } be a one-parameter family of mappings satis- 
fying (BO) and (Bl). Suppose that there exists 6p > such that the kneading invariant 
D(f p ,t) is monotonically decreasing for p G [po 7 Po + Sp\. Then 

R(n } y 0} p )nR(n } y 1} p )n(I x x \po,po + 8p\) = <b (D.28) 

for any y ^ y\ and any n > such that y G I x and y\ G I x . 

Proof: Suppose that there exists y G I x and y\ G P such that 

R(n } y 0} p )nR(n } y 1} p )n(I x x [p ,Po + Sp\) ^ 0. (D.29) 

for some n > where N(y 0} f Po ) < n and N(yi 7 f Po ) < n. It is sufficient to show that 
this statement contradicts the condition that D(f P7 t) is monotonically decreasing for 
p G [po,Po + tip]- 

Let p' > p be the smallest value such that there exists a pair of points y 2 G I x and 
J/3 G I x with j/2 < j/3 satisfying: 

i?(n,2/ 2 ,Po)ni?(n,2/3,Po)n(4 x [po,p'])^0. (D.30) 

Assuming that (D.29) is true, we know that p' < p + <5j>- Now hx j/ 2 in the right 
hand side of (D.30) and let y 3 take on all values such that y 3 > y 2 and y 3 G I x . Let 
j/4 be the smallest possible value of y 3 that satisfies (D.30) and set x' G I x such that 
(x',p') G R(n,y 2 ,p ) and (x',p') G R(n,y 4 ,p ). 

Let G2 be the set of all continuous functions, g 2 : I p — > P, such that g 2 (p') = x' and 
f(92{p)iP) £ -R(ra, y 2} po) for all p G ip. By lemma D.3.1, there exist at least one element 
in G 2 . Set 

g 2 (p) = sup g 2 (p). (D.31) 

32eG 2 

Clearly g 2 (x) must be also be continuous function that satisfies g 2 (p') = x' and f(g 2 (p),p) G 
P(n, y 2} po) for allp > p if p G i p . Similarly we can define g±(x) in analagous way, making 

g±(x) = inf g±(x) (D.32) 

3 4 eG 4 
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where G 4 is the set of all functions g 4 : I p — > I X} satisfying g^p') = x' and f(g4(p),p) G 
R(n } j/4,po) for all p satisfying p £ I p and p > p . 

Because of our choice of p 1 } we know that g 2 (p) 7^ 9a{p) if P G [po,p')- Now let 

7 2 = {(/Mp),p),p)Ipg/ p } 

^4 = {(f(94(p),p),P)\p£l p }- 

And let M £ I x X I p he the interior of the region bounded by J 2 U J4 U (7^ X {po})- From 
our choice of p' we know that 

J 2 r\R(n } y } p )n(I x x \po,p')) = 
7 4 ni?(n,j/,p )n(7 :c x \po,p')) = 

for any y ^ y 2 and y ^ j/ 4 . From our choice of j/ 4 we also know that (x',p') ^ R(n } y } po) 
for any y G (2/2,2/4)- Thus we conclude that no R(n } y } p ) intersects Af for any y G 7^ 
satisfying y ^ y 2} y ^ J/4, and N(y } f Po ) < n. Finally, from our choice of of #2 (2) and (/ 4 (x) 
it is also apparent that neither R(n } y 2} p ) nor i?(n, j/ 4 ,p ) intersects Af. Consequently, 
we see that: 

AfnP(n) = 0. (D.33) 

Now let 

M,(p) = {x|(x,p)GM} 

where Af denotes the closure of Af. From (D.33) we know that /' is strictly monotonic 
on M x (p) for any < i < n. Note in particular that this implies that there can exist no 
< i < n such that 

gi(p) = g\(p) = c (D.34) 

for any p G \po,p')- 

Now let {ak}^- Q be a monotonically increasing sequence such that a = p and 
Q'k — *■ p' as k — > 00. We know that for any p G [po 5 £>'] 5 there exists an k < n such that 
f k (g2(p),p) = c. Thus consider the sequence {6fc}^l where &£ = N(g 2 (ak), fa k )- Since bk 
can only take on a finite number of values (0 < bk < n), we know there exists an inhnite 
subsequence {ki}fl such that 6^ = b if z > for some < b < n. This implies that 
f b (g 2 (ak t ),ak t ) = c for all z > 0. Also, since / is continuous and a^ — > p' as z — > 00, we 
can also conclude that 

f b (g2(p'),p') = f b (x',p') = c. (D.35) 



00 




We also play the same game with g 4 instead of g 2 . Consider the sequence {di}f 
where d{ = A(g 4 (a/; 8 ), f ak .)- We know that d{ can only take on a finite number of values 
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so there exists an infinite subsequence, {ij} C jL and a number < d < n such that d^ = d 
for all j > 0. In this case, f d (g2((ik t )-, a k t ) = c for all j > 0. Since a^ — > p' as j — > oo 
this implies that 

f d (g4(p'),p') = f d (x',p') = c. (D.36) 

However, from (D.34) we also know that d{ ^ b} Zi for all i > 0. Thus d ^ b. For 
defmiteness assume b < d. There exists 8p\ > such that if p G [p' — <5pi,p'] then 
5*2 (p) 7^ c whenever g^p') ^ c for any z satisfying b < i < d. Choose p* = a^. for some 
j > large enough such that p* > p 1 — 8p\. Note that by this choice of p* } we know that 
f b (92(p*),P*) = c and f d (g 4 (p*) }P *) = c. 

Now recall the definition of the kneading invariant: 

oo 



8 = 1 

where 

Oi(fp) = £i(fp)t2(fp) ■ ■ ■ ei(fp) 
ti(f p ) = \imsgn(Df(f(c,p))) 

We claim that 

|1 + E Wr'Wl > |1 + E k (fp*)f\ (D.37) 

8 = 1 8 = 1 

If this claim is true, the rest of the lemma follows. At this point we shall finish the proof 
of the lemma before coming back to the proof of the claim. 

From (D.35) and (D.36) we know that 

6 d - b (f p .) = +1 (D.38) 

Also, since g 2 (p) ^ g 4 {p) for p G [po,p'), and f d {g4{p*),P*) = c, we know f d {g2{p*),P*) = 
f d ~ b (c,p*) ^ c. Combining this result with the fact that /i is monotone on M x (p*) we 
see that if f d ~ b (c,p*) > c then f d ~ b has a maximum at x = c, which implies that f d ~ b+1 
must have a minimum at x = c. Otherwise, if / (c,p*) < c then / ~ has a minimum 
at x = c, and again f d ~ b+1 has a minimum at x = c. Thus we conclude that: 

e d _ h (f v ,) = -1. (D.39) 

Finally, combining (D.38) with (D.39) with the claim above we find that \D(f p i,t)\ > 
\D(f p * } t)\. But since p' > p* } this contradicts the assumption that the kneading invariant 
of f p is monotonically decreasing with respect to p. This proves the theorem, except for 
the proof of the claim which we give below: 
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We now prove the claim given in (D.37) by induction on i. Suppose that # 8 _i(/ p /) = 
6i-i(f p *). We shall show that 6i(f p >) > t (f p *). 

Since f b (g2(p'),p') = f b (g2(p*),P*) = c, we can see that 

sgn(Df(f(c,p))) = sgn(Df(f b+l (g 2 (p),p),p)) 

for either p = p' or p = p* . Since i?(n, y,po) does not cross the boundary of M for any 
y G Ixi we can see that either both f b+t (g2(p')iP') > c and f b+t (g2(p*)iP*) > cor both 
f b+t (g2(p')iP') < c and f b+t (g2(p*)iP*) < c since both (g2(p')iP') and (52 (p* ) , P* ) are 
on the boundary of M. Furthermore from our choice of p* and <5j>i > we know that 
if g l (c,p') ^ c then g 8 (c,p*) ^ c for < i < 6 — d. Consequently we can see that if 
g l (c,p') ^ c then 

£,■(/„') = e,-(/ p .)- (D-40) 

This in turn implies Oi(f p >) = 0i(f p *) since 0i(f p ) = e 8 '(/ p )^ 8 _i(/ p ). On the other hand, if 
g l (c,p') = c, then 0i(f p >) = +1 so we automatically know that 0i(f p >) > 6i(f p *). 

Finally, note that the 0i(f p >) > 0i(f P *) is satisfied for i = 1 since we have 0i(f p >) = 
0\(f p *) from (D.40) if g(c } p') = c and 0i(f p >) > 0\{f P *) if g( c iP') = c - This completes the 
proof of the claim. 

Lemma D.3.3 Let {f p : I x — > I x \p G I p } be a one-parameter family of mappings sat- 
isfying (BO) and (Bl). Let p G int(I p ) and M p = sup X £i x (D p f(x } p )). Given x G I x 
such that n = N(x 0} f Po ) < 00, then for each p G J(x 0} p ): 



M, 



n-l 



\ h U P0 )(p)\ < \D x f(f^(h { J 0M) {p),p),p)\ E l Dxf(h ( x , Po) (p)iP) 1 

Proof In order to prove the lemma, we hrst need the following result (which can be 
found, for example, on page 417 of [33]). 

Claim: For any x G I x and n > 1 : 

n-l 

\D p f n (x,p)\ <M P J2 iDxr-'-'ifix.p)^)] (D.41) 



8 = 



Proof of claim: Proof by induction on n. For n = 1 the claim is clearly true. By the 
chain rule, for any n > 1 : 

D p f n (x,p) = D p f(p- 1 (x,p),p) + Dxf(p- 1 (x,p),p)D P p- 1 (x,p) 
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Thus we have the following 

\D p f n (x,p)\ < M p + \D x f(r- 1 (x,p),p)\\D p r- 1 (x,p)\ 

n-2 

< M p + \D x f{r-\*,p),P)\M p £ \D x f n - 2 -\r(x,p),p)\ 



8 = 

n-2 



< M p + M P Y,\D x f n - 1 -\f\x,p),p)\ 

8=0 
71-1 

< M.^iD.r-'-'inx.p)^)] 



8=0 

This completes the induction argument and proves the claim. 

Returning to the proof of the lemma, we know that since f n (h^ XO:Po ^(p) } p) = c for 
p G J(x 0} p ). Consequently 

^[r(h {xo , yo) (p),p)]=0 (D.42) 

By the chain rule: 

Thus, combining (D.42) and (D.43), we have: 

I'W..MI - lDlfHhMip) , p)l < D ' 44 > 

Let x p = h( XOtPo )(p). Then, combining (D.41) and (D.44) we have: 

lh <™ ){P)l ~ \DJHx P , P )\ 

< M p n ^ \D x r-^f(x pi p) 1 p)\ 

- \D x f n {x p ,p)\ f^ \D x f(f(x p ,p),p)\ 

M n ~ 1 1 



\D x f{f^{x p ,p),p)\ ^ \D x p{f{x p ,p),p)\ 
provided p G J(x 0} p ). This proves the lemma. 

Lemma D.3.4 Let {f p : I x — > I x \p G I p } be a one-parameter family of mappings satis- 
fying (BO) and (Bl). Suppose that p G int(I p ) } and f Po satisfies (CE1). Also, suppose 
that x G I x such that n = N(x 0} f Po ) < oo, and mino< 8<ri |/ 8 (x ,po) ~ c \ = a x > 0. 
Then there exist constants C\ > (independent of x ) such that 

1 

~J2~ 



l^o,Po)(^o)l < Ci 



a i 
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Proof: From lemma D.3.3 : 

M n ~ 1 1 

l^o*o)0*)l ^ \D x f(f^(l , Po ), Po )\ £ \D x f(x , Po )\ (D - 45) 

From lemma D. 2. 7, we also know that f Po satisfies condition (CE2). Thus, since f n (x 0} p ) 
c, we know there exists Ke > such that \D x f(f n ~ 1 (x 0} p ) } p )\ > Ke- Substituting 
this into (D.45) we have: 

|A '<»-» (p °^f|^7CT (D ' 46) 

From lemma (D.2.11) we know that there exists C > and A > such that: 



Then from (D.46), 



M n ~ 1 
\h[ Xo , Po) (po)\ <~irll 

i=0 



K E 



\Dg\x)\ 


> 


c<^ 










\ 1 


< 


M p 
K E Ca 2 Xo 


( 


f 


-)<C X 


1 


; c<a«- 


( 1- 


-A- 


a lo 



if we set C\ = K P c ( 1 _^-i )• This proves the lemma. 

Lemma D.3.5 Let {f p : I x — > I x \p £ I p } be a one-parameter family of mappings satis- 
fying (BO) and (Bl). Let p £ I p and suppose that x £ I x such that n = N(x 0} f Po ) < oo 
and mino< 8 '<n |/ 8 ( z o,Po) ~ c \ = a x > 0. Then for any < f3 < 1 there exists < C 2 < \ 
such that if X\ £ I x and pi £ I p satisfy: 

(1) \pi - p \ < C 2 a Xo . 

(2) \f{x 1 ,p 1 ) - P{x ,Pq)\ < C 2 a X0 forO <Kn 

then 

lAji^i)! 



\D x f t (x ,p )\ 
for < i < n. 



>(3 l . 



Proof Combining lemmas D.2.f and D.2.2 with conditions (f) and (2) above we find 
that there exists K > 0, K\ > 0, and K 2 > such that: 

\D x f(f(x 1 ,p 1 ),p 1 ) - D x f(f(x 1 ,pi),p )\ 

< K \pi -Pol < KoC 2 a Xo (D.47) 
\D x f(f(xi,Pi),Po) ~ D x f(f i (x ,p ), Po )\ 

< K 1 \f l {x 1 ,p 1 ) - f (x ,po)\ < K x C 2 a X0 (D.48) 
\D x f(r(xo,pa),pa)\ 

< K 2 \f{xo,po) -c\< K 2 a X0 (D.49) 
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for all < i < n. 

From (D.47) and (D.48) we have: 

\D x f(f(x 1 ,p 1 ),p 1 ) - D x f(f(xo,p ),p )\ 

< \D x f(f(x 1}Pl ) }Pl ) - D x f(f(x 1}Pl ) }Po )\ 

+ \D x f{f{x 1 , Pl ), Po ) - D x f(f(xo,p ),p )\ 

< K C 2 a Xo + K x C 2 a Xo = C 2 (K + #1)0^ (D.50) 

for all < i < n. 

Now set C 2 = min{|, 5^7 (1 - /?)}• Then from (D.50) and (D.49): 

\D x f(f(x 1 ,p),p 1 )\ > \D x f(f i (x u pi),pi)-D x f(f i (xo,p ),p )\ 



\D x f(f(x , P o), P o)\ \D x f(f(x , Po ), P o)\ 

C 2 (K + K 1 )a X0 



> 1 



K 2 a 



x 



- '-'jot^^t^ 



for all < i < n. Thus we have: 



11 in ftf i(„ ^ ^l > / 3 



\D x f l (x ,p )\ j=0 \D x f(f3(x 0}Po ) }Po )\ 
if < i < n, which proves the lemma. 

Lemma D.3.6 Let {f p : I x — > I x \p £ I p } be a one-parameter family of mappings sat- 
isfying (BO) and (Bl). Suppose that p £ int(I p ) } and f Po satisfies (CE1). Let x £ I x 
such that n = N(x 0} f Po ) < 00 and mino< 8 < n \f l (%o,Po) ~ c \ = a x > 0. Then there exist 
C3 > and C4 > (independent of x ) such that 

\ti(-o,Po)(p)\ <c ^~r 

x 

ifp £ V(x ,po) where V(x ,p ) = \po,p +6pi], Spi = C 4 a 3 Xo , andh( XO)Po ) : V(x ,p ) -> 4 
is aC 2 function satisfying h( XO}Po )(p ) = x and f n (h^ X0}P ^(p),p) = c for all p £ V(x ,p ). 

From lemma D.3.1 we know that there exists a C 2 function h( XOtPo )(p) such that 
h{x , Po ){po) = x and f n (h^ X0}P ^(p),p) = c if p £ J(x ,p ) where J(x ,p ) C I p is a 
interval containing p - Also from lemma D.3.1 we know that there exists a continuous 
function g( Xo , Po )(p) satisfying g( Xo , Po )(po) = x and f n (g (xo , po) (p),p) = c for all p £ I p . 
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By lemma D.2.11, there exists C > and A > such that: 

D x f(x ,p )>Cal o \ i . (D.51) 

for any < i < n. 

Now hx Ai = ^|— > 1 and let f3 = -y < 1. Then given g^ XOtPo ^(p), we know from 
lemma D.3.5 that there exists a constant < C\ < \ (dependent only on (3) such that 
if V(x 0} p ) C I p is the maximal interval satisfying the following conditions: 

(f) If p G V(x ,p ), then \p - p \ < C 2 a Xo . 

(2) If p G y(x ,p ), then \f\g( Xo , Po ){p),p) ~ f l {x ,Po)\ < C 2 a Xo for < z < n, 

then p G V(x 0} p ) implies that: 

\D x f l {g( X0 ,p ){p),p)\ 



\D x f l (x ,p )\ 



> ff (D.52) 



for any < i < n. Note that by setting Ai > 0, we have also set the constants < f3 < I 
and < C\ < \-, so these constants are fixed for the discussion that follows. 

Note, also, that from condition (2) above it is apparent that g( XO:Po ) ^ c for any p G 
V(x 0} p ). From lemma D. 3. 1, this implies that V(x 0} p ) C J(x 0} p ) so that g( Xo , Po )(p) = 
h (xo,Po)(p) is C 2 wlien p G V(x ,p ). 

Now consider the sequence {j/-i}" =0 where j/_ 8 - = f n ~ t (x 0} p ) so that j/_ n = x and 
j/o = c. Then, from (D.5I), (D.52), and our choice of /3, we know that: 

\D x r(h (y _ t ,p 0) (p) } p)\ > \D x f(y^ Po )\P > Ca 2 x Xp > Ca 2 Xo X\ 

if p G V(y-i,po) for any < i < n. Substituting this into lemma D.3.3 we hnd that if 
p G V(x ,po) : 

M l 1 
\V (n)\ < —Z V - iD W) 

I ("-^) W| -|^/(^(p),p),p)|^|^/i(V^)(p),p)l l ' ] 

Where z(p) = / n_1 (h^ XO:Po ^(p) } p). Since f Po satishes (CE2) and f(z(p),p) = c, we can 
bound \Df(z(po),po)\ > Ke for some constant Ke > independent of x . Consequently 
from condition (2) above and lemma D.2.I there must exist K' E > (independent of x ) 
such that \Df(z(p),po)\ > K' E if p G V(x 0} p ). Substituting this into (D.53) we have: 

M l 1 






k'eCvI^i-k 
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Thus setting C 3 = p 1 , we have that 

A _E G ( 1_A 1 ) 



l^-^)(P)l ^ ^ (D.54) 



a i 



for 0<z<nifpG V(£ ,po)- Of course, since x = j/_ n , this also implies that 



\h\ Xo , po) {p)\ <Cz-r 



a i 



if p G y(x ,p ) 



This places the proper bound on the derivative h'< \{p)- Now we need to find a 
proper bound on the size of V(x 0} p ). Set 



a 



Sp = mm{j^-al ,C 2 a Xo ,sup(I p ) -po}. (D.55) 

We claim that if [p 0} p + Sp] C y(j/_( 8 _i),p ), then [p 0} p + Sp] C V(y- ll p Q ). Also, 
it is clear that [po 7 po + Sp] C V(c } p ) = V(yo 7 po). So, by induction on z, this claim 
implies that [po 7 po + <5p] C V(y- n} po) = V(x 0} p ). Thus if the claim is true, then 
from (D.55), and since a Xo is bounded above, we know there exists C4 > such that 
[PotPo + dpi] C V(x 0} p ) where Spi = C^ct^^. This proves the lemma. Thus, all that is 
left to do is to prove the claim. 

Suppose that the claim were not true. This means there exists p\ G [po 7 Po + Sp] 
such that pi G - V(y-i 7 po). From our specification of V(x 0} p ) and the intermediate 
value theorem, it is apparent that the only way this can happen is if there exists some 
P2 G [po,Pi] such that 

\h(v-i,po)(P*) ~ y~ l \ = C ' 2ax o (D.56) 

and \p ,p 2 ] C V(y-i,p ). 

However, by the mean value theorem, we know that 

\h(y_ t , P0 ){p2) ~ y-i\ = \h(y_i, P0 )(p2) ~ h( y _ t)Po ){po)\ 

= \ h \y_„p ){Ps)\\P2 - Pol (D.57) 

for some p 3 G [^0,^2] C V(j/_( 8 _i),po)- But from (D.54): 

l^jfcOl < Cs-^- (D.58) 

Combining (D.57), (D.58), and our choice of Sp we find that 

\h(y_ t , P0 ){p2) - y-i\ < C 3 —\p 2 -po\ 

< c 3 ^-s P 

0. 

< ^C 2 a X0 
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which contradicts (D.56) and proves the claim. 

Lemma D.3.7 Let {f p : I x — > I x \p £ I p } be a one-parameter family of mappings sat- 
isfying (BO) and (Bl). Given any p £ int(I p ), x £ I X} pi £ int(I p ) } , and X\ £ I X} 
suppose that W(x ) C I x X I p , is a connected set that can be represented in the following 
way: 

W(x ) = {(a X0 (t)J X0 (t))\t^[^l]} 
where a Xo : [0, 1] — > I x and f3 Xo : [0, 1] — > I p satisfy the following properties: 

1. ct Xo (t) and f3 Xo (t) are continuous. 

2. f3 Xo (t) is monotonically increasing with respect to t. 

3. a Xo (0) = x , a Xo {\) = x x . 

4- ^o(°) =Po, ^ol 1 ) = Pi- 
Then there exists constants Sp > and C > (independent of x ) such that if \x x — x \ > 
C\p\ — Po\~ and \pi — p \ < 6p } then 

W(x )C]R(n } y } p )n(I x x \po,po + 6p\) ^ $ 
for some n > and y £ I x such that y ^ x . 

Proof: We assume that X\ > x and pi > p (the other cases are similar). From 
lemma D.2.13, we know that there exist constants K > and e > so that for any 
positive e < e , there is a y £ (x 0} x + e) such that f n (y } po) = c and mino< 8 <„ f l (j/, po) > 
K e for some n > 0. From lemma D.3.6, we know that there exist constants K\ > and 
K2 > such that if 

6p t = K^Kotf (D.59) 

then for all p £ [po 7 po + &p t ] : 

K, P o)(p)\ < k ^if-/- ( D - 6 °) 

± V Q C 



Thus given x £ I X} X\ £ I X} p £ int(I p ) } and pi £ int(I p ) choose 

1 (Pi -Po.i 
e = p . 

AV Ki ' 



fD.61] 
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Also, set Sp = Ki(Ko€ ) 3 . Note that this means pi — po < Sp implies that e < e , so that 
the results of the previous paragraph hold. 

In particular, if we substitute (D.61) into (D.59), we find that 6p t = Ki(K e) 3 = 
Pi — Po so that from (D.60) we have that for all p £ [po 5 £>i] : 

K*>)(p)\ < Mj^) 2 

± V Q C 

for some y £ (x 0} x + e). Consequently: 

h (v,po)(Pi) < h (y,po)(Po) + (Pi -Po) Jnf ,W {yiPo )(p)\ 

< y + K 2 {-—f{p 1 -p ) 

A e 

< (x + e) + K 2 {-—f{ Pl - p ) 

A e 

I + K K 1 K 2f i_ 

= X -\ T {Pl-Po} 3 

KfKo 

= xo + C^-po) 1 - (D.62) 

where C = ^+ K \ K ^ . 

Now suppose that (x\ } pi) £ W(x ) where Xi — x > C\p\—po\~ ■ From (D.62) we know 
that there exists a continuous function, h( ytPo )(p) such that (h^ y:Po ^(p) } p) £ R(n } y } p ) 
for all p £ [po 5 Pi] where ^(j,, Po )(po) = V > %o and ^(j/, Po )(pi) < £i- We are also given that 
W(x ) can be represented as W(x ) = {(a Xo (t) } f3 Xo (t))\t £ [0,1]}. Using the Intermediate 
Value Theorem, it can be shown that h( yiPo )(f3(ti)) = a Xo {t\) for some t\ £ [0,1]. This 
implies that 

W(xo)nR(n,y,po)n(I x x \p ,p + 6p\) jL $ (D.63) 

which proves the lemma. 

Proof of Theorem 3.3.2: Note that the theorem is trivial if x = Bd(I x ) (where Bd(I x ) 
denotes the boundary of I x ). Otherwise, fix p £ int(I p ) such that f Po satishes (CE1) 
and suppose there exists 8p\ > such that D(f P7 t) is monotonically decreasing for 
p £ [poiPo + dpi]- Given any x £ int(I x ) let: 

X~(x ) = {x\N(x, f Po ) < n and x < x } 

X^(xq) = {x\N(x, f Po ) < n and x > x } 

Dehne the following functions a~ : I p — >■ I x and a+ : 7 P — > I x : 

a n,* (p) = SU _P {^IO^p) g R{n,x',p ),p £ 7 P } (D.64) 

a^ (p) = inf {x\(x,p) £ R(n,x',p ),p £ 7 P } (D.65) 
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It is apparent from our specification of R(n } x } p ) that a nx (p) and a+ x (p) must be 
continuous with respect to p. 



First of all note that a~ x (p) > a nx (p) ^ m > n - Furthermore, we claim that for 
any n > there exists m > n such that a^ (p) > a~ Xo (p) for all p G [po 5 £>o + <5j>i]- By 
lemma D.3.2 we know that if D(t, f p ) is monotonically decreasing for p G [po 7 Po + 8p\\ 
then i?(n, x,p ) and i?(n, x' } p ) do no intersect in the region 7^ X \po,Po + 6pi] provided 
x ^ x' . This is implies that we can rewrite (D.64) as: 

a n,x (p) = sup{x\(x,p) G R(n,xl,po)} (D.66) 

where x* n = sup{X~(x )}. Also we know from lemma D.2.9 that given any n > 
there exists some m > n such that x* m > x* n . This proves the claim. Similarly, we also 
can show that for any n > there exists m > n such that a+ x (p) < a+ x (p) for all 
p G [po,Po + Spi]- 

Returning to the lemma, we note that since a~ Xo (p) is monotonically increasing with 
respect to n, and bounded above by sup I x = 1, there exists a function, a~ (p), such that 
the limit 

a *o(p) = i™ a n, X0 (?) ( D - 67 ) 

converges pointwise. Now set 



and define 



i (p)=limsupo Io (t) (D.68) 



S (x ) = {(x,p)\ liminf b (t) < x < limsupfe (t)}. (D.69) 

t— >p t^p 



Similarly we can also define S + (x ) as follows: 
b t {p) = liminfa+(<) 



S + (x ) = {(x,p)\limmf b Xo (t) < x <lim sup b Xo (t)}. 

t— >p j^p 

The next step is to show that 

S-(x )nR(n,x,p)n(I x x [po,po + ^Pi]) = (D.70) 

for any x ^ x and any n > 0. This will be done in two parts. First we address the case 
where x < x . We claim that (D.70) is true if x < x . Suppose the claim is not true. Then 
from (D.64) there must exist some (x\p r ) G S~(x ) and n > such that a~ Xo (p') > x' 
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where p' G [po 7 p + 8pi]. But we have already seen that for any n > there exists an 
m> n such that a^ XQ (p) > a~ }Xo {p) for all p G [p 0} p + 6 Pl ]. Thus a"(p) > a~ }Xo {p) for 
any n > if p G [po 7 Po + <5j>i]- Consequently since a~ Xo (p) is continuous: 

s' < a~ (p) = lim inf lim sup a~ (t) 

< lim inf lim sup a~ (t) = lim inf b~ (t) 

which from (D.69) implies that (x',p') G - S~(x ). This is a contradiction which proves 
the claim. 

We now claim that S~(x ) DR(n } x } p)D (I x X [p 0} p -\-8pi]) = if x > x . If this claim 
is not true, then from (D.65) we can see that there must exist some (x',p') G S~(x ) 
and n > such that a+ (p') < x' where p' G [po 7 Po + 8pi]. Furthermore there exists 
m > n such that a+ x (p) < a+ x (p) for p G [po 7 Po + 8pi]. Thus there exists e > such 
that a^ nxo (p') < x' — 2e. Since a^Jp) is continuous, this implies that there exists 8 > 
such that 

a + m)Xo (p) < x' - e. (D.71) 

for any p such that |p — p'\ < 8. But since (x',p') G ^"(xo), 

lim sup lim sup lim a~ x (t) > x'. 



Since a~ x (p), is continuous, this implies that for any 6 > and e > there is an n > 
and pi with |pi — p'\ < 8 such that ^^(pi) > x' — e. Combining this with (D.71) we see 
that there exists p 2 such that a~ (p 2 ) = a,+ (^2)- But this is impossible by lemma D.3.2 
because it implies that (x',p') G R(m } Xi 7 p ) and (x',p') G R(n } x 2} p ) for some n > 0, 
m > 0, Xi ^ x 2} and p' G [po 7 Po + <5j>i]- This contradiction proves the claim. 

The next step is to show that S~(x ) U S + (x ) is invariant under /. We claim that 
if (x,p) G S~(x ) then either (f(x,p),p) G S~(f(x ,p )) or (f(x,p),p) G ^(/(a^Po))- 
For any x G int(I x ), there exists an e > such that (x — e, x ) C (i^ \ {c}). Let 
J = (x — e, x ). Then, since f Po is a diffeomorphism on J, for any j/i G f(J,Po) such that 
n(j/i) = N(y 1 ,f P0 ) < 00, there exists y G J such that j/ x = f(y ,p ) and N(y ,f Po ) = 
n (yi) + 1- Consequently, from (D.66) we know that there exists N > such that for all 
n > iV : 

f ( - ( \ ] = f a n,f(x , P0 )(p) if D x f(x,po) > on J 
AVW ' PJ I </(«o^)(P) if D*f(*,P°) < on J 

for any p G [po 5 £>o + <5j>i] if £ G int(I x ). This result combined with our specification of 
S~(x ) in (D.67), (D.68), and (D.69) proves the claim. Using the analogous result for 
S + (x ) gives us that S~(x ) U S + (x ) is invariant under /. 
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Finally, from the formulation of S (x ) in (D.69), it is apparent that there exists a 
W~(x ) C S~(x ) such that W~(x ) can be represented in the following way: 

W-(x o ) = {(a Xo (t),/3 Xo (t))\te[0,l]} 

where a Xo : [0,1] — > I x and f3 Xo : [0,1] — > I p are continuous functions and f3 Xo (t) is 
monotonically increasing with respect to t with f3 Xo (0) = po and f3 Xo (l) = po + 8p\- Of 
course, a similar W + (x ) C S + (x ) also exists. 

Putting it all together, we have now shown that: (1) S~(x ) U S + (x ) is invariant 
under / and (2) (S~(x ) U S + (x )) C\ R(n, x,p ) C\ (I x X [po,Po + $Pi]) = for an y ^ > 
and any x ^ x . From property (2) above, lemma D.3.7, and since W~(x ) C S~(x ) } it 
is apparent that there exists 6p 2 > and C > (independent of x ) such that if (x,p) G 
VF~(x ) then \x — x \ < C(p — po)~ ■ Set <5p = min{<5j>i, <5p 2 } and let VF(x ) = W~(x ) for 
p G [pojPo + ^p]- Then property (1) implies that given any x G int(I x ) } if (x,p) G VF(x ) 
and p G [po 5 £>o + <5j>], then \f n (x } p) — f n (x 0} p )\ < C(p — po)~ for any n > 0. This proves 
the theorem. 
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Appendix E 



Proof of theorem 3.4.2 



This appendix contains the proof for theorem 3.4.2. For reference, the conditions, (CE1) 
and (CE2), can be found in the beginning of appendix D. 

Theorem 3.4.2 Let I p = [0,4], I x = [0, 1], and f p : I x — >■ I x be the family of quadratic 
maps such that f p (x) = pxil — x) for p £ I p . Then there exist constants 6 > 0, C > 0, 
K > 0, and set £"(7) C I p with positive Lebesgue measure for every 7 > 1 such that: 

(1) 1/7 > 1 and po £ £'(7), then f Po satisfies (CE1). 

(2) If f Po satisfies (CE1), then for any e > sufficiently small, any orbit of f Po can be 
e— shadowed by an orbit of f p for p £ [po 7 po + Ce 3 ]. 

(3) If 7 > 1 and p £ £'(7), then for any e > ; almost no orbits of f Po can be 
e— shadowed by any orbit of f p for p £ (p — 6 } p — (Ke) 1 ). 

That is, the set of possible initial conditions, x £ I X} such that the orbit {f Po (xo)}fl 
can be e— shadowed by some orbit of f p comprises at most a set of Lebesgue measure 
zero on I x if p £ (p — 6 } p — (Ke) 1 ). 

Proof of Theorem 3. 4. 2: We hrst address parts (1) and (3) of theorem and come back 
to part (2) at the end of the proof. 

The basic idea behind parts (1) and (3) is to apply theorem 3.3.1 to theorem 3.4.1. 
There are four major steps. We hrst set lower bounds on the return time of the orbit of 
the turning point, c = |, to neighborhoods of c. Next we show that f p satisfies (CP1) 
and favors higher parameters on a positive measure of parameter values. This allows us 
to apply theorem 3.3.1. Finally we show that almost every orbit of these maps approach 
arbitrarily close to c so that if the orbit, {f Po (c)}fl 0} cannot be shadowed then almost 
all other orbits of f Po cannot be shadowed either. 
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1 

4 


(E.2) 


for any n > 1 


(E.3) 


4 for any n > 1 


(E.4) 


4 n ~ for any n > 1. 


(E.5) 



We first show that there is a set of parameters of positive measure such that orbits 
of the turning point, {f p (c)}fl 0} do not return too quickly to neighborhoods of c. This 
can be seen from the construction used to prove theorem 3.4.1. In [5] it is shown that 
for any a > 0, if S(a) C I p , is the set of parameters such that f Po satisfies both (CE1) 
and: 

Ifjc) - c\ > e~ ai (E.l) 

for all i G {0, 1,2,...}, then S(a) has a density point at p = 4. 

We now show that (CP1) is also satisfied on a positive measure of parameter values. 
First consider what happens if p = 4 : 

D p f(c,p = A) 

D p f(f n (c,p = ±),p = ±) 

\D x f(f n (c,p = A),p = A)\ 
\D x f n (c }P = i)\ 

It also a simple matter to verify that f p favors higher parameters at p = 4. Note that 
from the chain rule we have that: 

D P r(c, P ) = D x f(r- i (c,p),p)D P r- i (c,p) + D^r-^p)^) (e.6) 

for any n > 1 and any p G I v . Consequently, using continuity arguments we can see that 
for any N > and 6 > there exists ei > such that p £ [4 — ei,4] implies that both 
of the following hold: 

\D p (c }P )\ > 1-6 (E.7) 

\D p (f n (c,p))\ < «5foranynG{2,3,...,iV}. (E.8) 

From (E.6) we can see that: 

n— 2 n—1 

D p r(x,p) = D p f(r- 1 (c,p),p) + J2[D P f(f(c,p),p) n D x f(P(c,p),p)} 
j=i Uj=iD x f(p(c,p),p) 

g D p f(r(c,p),p) 1 



8= in; =1 AjtP(c,p), P ) j 

n—l n—1 



TT D x f(f(c,p),p)[D p f(c,p) + y Dpfif'i^P)^) j (K9) 
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for any n > 1. But from theorem 3.4.1, we also know that there exists K E > and 
X E > I and a set E C I p of positive measure such that if p G E, then (CE1) is satisfied 
for f p : 

n 

| n D x f(ft(c,p),p)\ = \D x r(f(c,p),p)\ > K E X E . 
Substituting this into (E.9) we have: 

\D p r(x,p)\ > K E XT 1 [\D p f(c,p)\- n t l l^/(/ , (^)»P)l ] 

Substituting (E.7) and (E.8): 

1 N 8 n ~ 1 1 
|D„r(x,p)| > A^Ar 1 [(--<5)-V — - y \ F ] 



L 4 A^(l-A^) 4/1^(1 -A^ 1 ; 

for any n > 1 . Now if if we set 



C F = \--8 



1 „ 6 A 



-(JV+l) 



_B 



4 A^(l-A^) 4A S (1-A^) J 

we see that Ce > if 8 > is sufficiently small and iV > is sufficiently large. From 
(E.7) and (E.8) we know that we have full control of 8 > and N > with our choice 
of ei. So choose t\ > small enough so that Ce > for any p G [4 — ei,4]. Then we 
have that: 

\D p f n (x,p)\ > KeCeXe' 1 (E.10) 

for all n > 1 if p G [4 - e x ,4] and / p satisfies (CE1) (ie, \D x f n (f(c,p),p)\ > K E \ E for 
all n > 1). Looking at (E.6), it is also apparent that if (E.10) is satisfied, then since 
\Dpf(f n ~ 1 (c 1 p) 1 p)\ < |, the sign of D p f n (x,p) is governed by the signs of D x f '( / n_1 (c, p) , p) 
and _D p / n_1 (c,p) for n > 1 sufficiently large. Thus, since / p favors higher parameters 
at p = 4, there exists some e > with e < ei such that f p favors higher parameters if 
p G [4 — e, 4] and f p satisfies (CE1). 

Consequently, (CP1) must be satisfied and f Po favors higher parameters for any 
Po G [4 — e, 4] such that f Po satisfies (CE1). But recall that for any a > 0, S(a) 
has a density point at p = 4 and p G S(a) implies that f Po satisfies (CE1). So let 
S*(a) = S(a) fl [4 — e, 4]. Then for any a > we can see that if p G S(a), then 
condition (E.l) is satisfied, f Po satisfies (CE1), and f p satishes (CP1) and favors higher 
parameters at p = p . Furthermore, S*(a) has a density point at p = 4. 
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Now recall from section 3.3.1 that n e (c, e,p ) is defined to be the smallest integer 
n > f such that \f n (c } p ) — c\ < e. Thus, if (E.l) is satished, then 

n e (c,e,p )> loge. (E.H) 

a 

But from theorem 3.3.1, we know that if f Po satisfies (CE1) and f p satisfies (CP1) and 
favors higher parameters at p = p £ I p} then there exist constants 8 > 0, Kq > 0, K\ > 
and A > 1 such that there are no orbits of f p which e— shadow the orbit, {/' (c)}°^ , if 
p £ (po — S,p — K e\~ ne ( c,Kie,p °}). Substituting in the condition (E.ll) we hnd that: 

K e\- ns{c ' Kie ' p ^ = K (K 1 e) 1+ ^ losX . (E.12) 

Now suppose we are given any 7 > 1. We can see that if a < -^-j-log A then 

l + -logA> 7 . (E.13) 

a 

So let £(7) = S( 2( 1 _ 1 x log A). Note that E 1 has positive Lebesgue measure and a density 
point at p = 4. For any 7 > 1, we also see that if p £ £"(7) then f p satisfies (CP1) 
and (CE1) at p = p . Thus by theorem 3.3.1 and from (E.12) and (E.13) we have 
that if p £ £"(7) then no orbits of f p e— shadow the orbit, {/' (c)}°^ , for any p £ 
(po — 6 } p — A^o(A^e) 7 ). But since 7 > 1, if we set constant K = max{A^ A^i, K\] > we 
see that p — A^o(A^e) 7 > p — (Ke) 1 for any e > 0. Thus, no orbits of f p may e— shadow 
U; o (c)}^o, if Pe (Po - $\Po - (Key). 

The hnal step is to show that almost any orbit of f p comes arbitrarily close to c. This 
can be seen from the following two lemmas: 

Lemma E.0.8 Let U be a neighborhood of c. For any p £ I p} if Ejj = {x | f p (x) £ 
I\U for all n > 0} contains no non-trivial intervals, then the Lebesgue measure of Ejj 
is zero. 

Proof of lemma E.0.8: See Theorem 3.1 in Gukkenheimer [26]. 

Lemma E.0.9 If p £ I p and f Po satisfies (CE1), then the set of preimages of c, C p = 
U *>°./p7( c )> is dense on I x . 

Proof of lemma E.0.9: See corollary II. 5. 5 in Collet and Eckmann [14]. 
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From these two lemmas we can see that for almost all x £ I p} the orbit, {f Po (xo)}fL 0} 
approaches arbitrarily close to c if p £ £(7), for any 7 > 1. Thus for almost all x £ 
I p , there are arbitrarily long stretches of iterates where the orbit, {/' (^o)}^05 l°°k s 
arbitrarily close to the orbit, {fp (c)}^l . This means that if there are no orbits of f p 
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that can shadow {/' (c)}°^ , there can be no orbits of f p that can shadow {/' (^o)}^o- 
Consequently for any 7 > 1 if p £ E{^) then f Po satishes (CE1) and almost no orbits 
of f Po can be shadowed by any orbit of f p if p £ (p — 6 } p — (Ke) 1 ). This proves parts 
(1) and (3) of theorem 3.4.2. 

Part (2) of theorem 3.4.2 is a direct result of Corollary 3.3.1, Theorem 3.4.1, and the 
following result, due to Milnor and Thurston: 

Lemma E.0.10 The kneading invariant, D(f p ,t), is monotonically decreasing with re- 
spect to p for all p £ I p . 

Proof of lemma E.0.10: See theorem 13.1 in [34]. 

Thus if p £ £"(7) satishes (CE1), there exists constant C > such that if p £ £"(7) 
then any orbit of f Po can be e— shadowed by an orbit of f p if p £ [po 7 po + Ce 3 ]. This is 
exactly part (2) of the theorem. 

This concludes the proof of theorem 3.4.2. 
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